levitate

I love you, you know.
Joined
Sep 3, 2015
Messages
39,236
Reputation
5,827
Daps
148,880
Reppin
The Multiverse
As a graphic artist who has spent countless hours over the course of over two decades becoming expert level good at doing things like clipping out things/people from backgrounds I feel a kinda way about this. They way they train the AI is by letting it watch humans do it the old fashioned way. Wild...


@3Rivers
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,840

FupJOo3WcAcry-Q
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,840



ControlNet is a highly regarded tool for guiding StableDiffusion models, and it has been widely acknowledged for its effectiveness. In this repository, A simple hack that allows for the restoration or removal of objects without requiring user prompts. By leveraging this approach, the workflow can be significantly streamlined, leading to enhanced process efficiency.

No-prompt​

restore Result restore Result
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,840

Making the community's best AI chat models available to everyone.
Current Model
OpenAssistant/oasst-sft-6-llama-30b
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,840

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,840

Natural Language Movie Scene Search Engine​


The goal of this project was to develop a natural language video search engine that could effectively search through large quantities of video data without relying on metadata like titles, descriptions, or audio transcriptions. The aim was to enable users to search for specific actions or scenes, such as closing and opening a door, and facilitate the comparison of these scenes across different videos.

The dataset for this particular tool is roughly ~30,000 scenes from imdbs top 250 movies.

Features​

  • Natural Language Search
  • Scene Similarity Search
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,840

Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations​

Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, Matthias Grundmann
The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.

 
Last edited:

TRUEST

Superstar
Joined
May 17, 2012
Messages
14,297
Reputation
2,656
Daps
54,578
Reppin
NULL
The unfocused invention and proliferation of “AI” technologies by every Tom d1ck and harrry will inevitably lead to a weakness of analysis paralysis. You’ll end up in a position where the general public is expected to do the outright ridiculous…like using a tractor trailer as a commuter vehicle to an office job.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,840
The unfocused invention and proliferation of “AI” technologies by every Tom d1ck and harrry will inevitably lead to a weakness of analysis paralysis. You’ll end up in a position where the general public is expected to do the outright ridiculous…like using a tractor trailer as a commuter vehicle to an office job.

:what:
not sure why you believe it's unfocused when dozens of focused projects sprout up everyday and thousands of models are being trained in specific knowledge areas.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,840

About​

mPLUG-Owl🦉: Modularization Empowers Large Language Models with Multimodality



Examples​

Training paradigm and model overview Training paradigm and model overview

News​

  • We provide an online demo on modelscope for the public to experience.
  • We released code of mPLUG-Owl🦉 with its pre-trained and instruction tuning checkpoints.

Spotlights​

  • A new training paradigm with a modularized design for large multi-modal language models.
  • Learns visual knowledge while support multi-turn conversation consisting of different modalities.
  • Observed abilities such as multi-image correlation and scene text understanding, vision-based document comprehension.
  • Release a visually-related instruction evaluation set OwlEval.
Training paradigm and model overview

Online Demo​

Demo of mPLUG-Owl on Modelscope
 
Top