The A.I Megathread (LLM , GPT , Development)

bnew · Oct 24, 2024

1/2
It's here already!! @AnthropicAI Computer Use - out-of-the-box, no docker required!

Can support any platform. A user-friendly interface based on Gradio

2/2
Computer Use - OOTB

An out-of-the-box (OOTB) solution for Claude's new Computer Use APIs.
No Docker is required, and it theoretically supports any platform, with testing currently done on Windows.

Extremely easy to launch: Clone the repo, install from requirements, and launch the app `python app_py`

A user-friendly interface based on Gradio will launch locally

Repo for Computer_Use_OOTB: GitHub - showlab/computer_use_ootb: An out-of-the-box (OOTB) version of Anthropic Claude Computer Use

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/1
Checkout our paper at Paper page - Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

[Quoted tweet]

Excited to release our study on visual comprehension and abstract reasoning skills via #PolyMATH. We provide both quantitative and qualitative evaluations of #GPT4o, #Claude 3.5 Sonnet, #Gemini 1.5 pro, #OpenAI o1 models & 13 other models.

Full paper: arxiv.org/abs/2410.14702

@huggingface Dataset:
huggingface.co/datasets/him1…

Key Insights:

A dataset of 5000 samples to test cognitive reasoning capabilities of MLLMs.

The best scores achieved on POLYMATH are ∼ 41%, ∼ 36%, and ∼ 27%, obtained by Claude-3.5 Sonnet, GPT-4o and Gemini-1.5 Pro respectively while human baseline was at ~66%.

An improvement of 4% is observed when image descriptions are passed instead of actual images, indicating reliance on text over image even in multimodal reasoning.

Open AI o1 models get competitive scores with human baseline on text only samples, highlighting room for improvement!

A massive shoutout to our outstanding team @s_verma3011, @ujjwala_ananth, @thegraydleguy and @Mihir3009 and guidance of @Swarooprm7 and @cbaral

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/10
@himanshu_gup14

Excited to release our study on visual comprehension and abstract reasoning skills via /search?q=#PolyMATH. We provide both quantitative and qualitative evaluations of /search?q=#GPT4o, /search?q=#Claude 3.5 Sonnet, /search?q=#Gemini 1.5 pro, /search?q=#OpenAI o1 models & 13 other models.

Full paper: [2410.14702] Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

@huggingface Dataset:
him1411/polymath · Datasets at Hugging Face

Key Insights:

A dataset of 5000 samples to test cognitive reasoning capabilities of MLLMs.

The best scores achieved on POLYMATH are ∼ 41%, ∼ 36%, and ∼ 27%, obtained by Claude-3.5 Sonnet, GPT-4o and Gemini-1.5 Pro respectively while human baseline was at ~66%.

An improvement of 4% is observed when image descriptions are passed instead of actual images, indicating reliance on text over image even in multimodal reasoning.

Open AI o1 models get competitive scores with human baseline on text only samples, highlighting room for improvement!

A massive shoutout to our outstanding team @s_verma3011, @ujjwala_ananth, @thegraydleguy and @Mihir3009 and guidance of @Swarooprm7 and @cbaral

2/10
@himanshu_gup14
PolyMATH evaluates the complex multi-modal cognitive reasoning capabilities of MLLMs. The tasks and puzzles could be easy for humans but are fairly challenging for even State of the Art Models!

3/10
@himanshu_gup14
The questions of the dataset are presented in the following format:

4/10
@himanshu_gup14
State of the Art Models are evaluated across various prompting methods:

5/10
@himanshu_gup14
Similarly open source models’ performance remains low on PolyMATH as well:

6/10
@himanshu_gup14
SOTA LMs frequently misinterpret diagrams, illustrating a need for improved understanding of visual data. Top LMs share common pitfalls in reasoning - suggesting a fundamental challenge in current architectures.

7/10
@himanshu_gup14
A case study on o1-mini and o1-preview on text only samples showed competitive performance compared to human baseline

8/10
@himanshu_gup14
Project Page: PolyMATH: A Challenging Multi-Modal Mathematical Reasoning Benchmark
Github Page: GitHub - polymathbenchmark/PolyMATH: Official github repository of XXXX

9/10
@himanshu_gup14
hf paper page: Paper page - Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

10/10
@synthical_ai
Dark mode for this paper for those who read at night

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/2
KlingAI Virtual Try-On

This project utilizes the KlingAI API to provide a virtual try-on experience using images of people and garments.

2/2
github: GitHub - AtaUllahB/KlingAI-Virtual-Try-On: This project utilizes the KlingAI API to provide a virtual try-on experience using images of people and garments.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1

@Kling_ai Kwai-Kolors just launched Kolors Virtual Try-On!

Upload your pic

, add the dress, and see how it looks on you instantly!

Try it here: Kolors Virtual Try-On - a Hugging Face Space by Kwai-Kolors

What do you think?

/search?q=#AI /search?q=#FashionTech /search?q=#VirtualTryOn /search?q=#Innovation /search?q=#ShoppingExperience

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/5
最強の動画生成AI「Kling」のVirtual Try-on APIがヤバすぎるので聞いてほしい。

API経由で動画内のあらゆる人物の着せ替えができちゃう

2/5
Klingと同じ会社が出してる、Kolorsの着せ替えモデルを踏襲してるっぽい。

Kolors Virtual Try-On - a Hugging Face Space by Kwai-Kolors

3/5
こらお試しで安いプランあるんで検証してみる。

[Quoted tweet]
最強の動画生成AI「Kling」のVirtual Try-on APIがヤバすぎるので聞いてほしい。

API経由で動画内のあらゆる人物の着せ替えができちゃう

4/5
このアカウントはAIのガチ考察もしてます

️

[Quoted tweet]
本日10月17日、Claude 3.5 OpusとChatGPT 4.5は本当に来るのか？！？

海外AIガチ勢の意見まとめました

️ note.com/meru2002/n/n1f7be67…

5/5
これがいろんなサービスに統合できてしまうのはやばすぎます。

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/6
@aitoolhouse
This is wild!

The creators of Kling AI have released a new Virtual Try-On tool called Kolors and it's really good!

Some examples and the link below

2/6
@aitoolhouse
Kolors is pretty accurate even with more complex patterns, although it's not 100% perfect in some cases.

It also understands how to apply shadows and lighting to the new generation:

3/6
@aitoolhouse
Surprisingly, it also works on multiple characters.

It automatically understands the nature of the garment and applies it correctly.

4/6
@aitoolhouse
Kolors can automatically apply full-body garments, like dresses and suits to the body.

5/6
@aitoolhouse
You can try it here:
Kolors Virtual Try-On - a Hugging Face Space by Kwai-Kolors

6/6
@aitoolhouse

Contact us if you made a great AI tool to be featured: Submit your AI Tool to AI Toolhouse

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Kling AI developers have just released a neural network that can transform a person in a photo into any clothes—Kolors Virtual Try-On.

It's fast, high-quality, and almost unlimited.

Check this out: Kolors Virtual Try-On - a Hugging Face Space by Kwai-Kolors

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2

Exciting news! Our KOLORS virtual try-on project is now live!

Experience the thrill of trying on any fashionable outfit and using @Kling_ai to turn your images into dynamic videos. Don't miss out!

Kolors Virtual Try-On - a Hugging Face Space by Kwai-Kolors

2/2
We are continuously optimizing the model effect, and the external access method is also actively preparing, please look forward to it~

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/9
@francoisfleuret
I don't think it's intellectually possible to foresee what media / movie creation will look like in two years.

Like those very specific memes and comics that you love but now it's a never ending series of 20min episodes.

[Quoted tweet]
Mochi 1

Dramatically closes the gap between closed and open video generation models.

Apache 2.0 license

High-fidelity videos
Strong prompt adherence
Model available on

Hub

https://video.twimg.com/ext_tw_video/1848781656980701184/pu/vid/avc1/1280x720/bMVaGeUurtUiA59s.mp4

2/9
@osoleve
My hot take is that it hasn't been possible to predict things years into the future since the industrial revolution

3/9
@francoisfleuret
You have a point.

4/9
@vineettiruvadi
Ads nauseam

5/9
@francoisfleuret
You realize you control what you look at though?

6/9
@yacineMTB
i'm going to let you in on a secret
the best quality content is going to be generated and posted anonymously on worksafe gif

7/9
@rom1504
Offer and demand. People ability to ingest content will not increase as offer increases.
So people will want *a lot more* than what they can have now.
So for example why not content generated just for you that changes immediately based on how you like it when you watch it.

8/9
@OriflammeTech
I predict the death of the film industry as people become able to generate whatever the hell they want, becoming the producer and director and whole studio themselves. Any actor, any scenario, any style, any soundtrack, etc. Total freedom. Copyright battles will be merciless.

9/9
@RaphLeclercAI
Francoisfleuret is spot on, predicting the future of media creation is a daunting task, especially with AI advancements like Mochi 1, which is already blurring the lines between closed and open video generation models.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/9
@Gradio
Mochi 1

Dramatically closes the gap between closed and open video generation models.

Apache 2.0 license

High-fidelity videos
Strong prompt adherence
Model available on

Hub

https://video.twimg.com/ext_tw_video/1848781656980701184/pu/vid/avc1/1280x720/bMVaGeUurtUiA59s.mp4

2/9
@Gradio
Mochi 1

A state-of-the-art video generation model by Genmo.
Start a Gradio 5 app following the instructions on the model card on @huggingface Hub

genmo/mochi-1-preview · Hugging Face

3/9
@GozukaraFurkan
missing the hardware requirement :D

4/9
@LatentSpacer
VRAM requirements?

5/9
@Marzzel2020
Availability to run it locally?

6/9
@lmontoya
Make with mochi 1

https://video.twimg.com/ext_tw_video/1848806772632436736/pu/vid/avc1/848x480/aOhywEpf4EvS2QJT.mp4

7/9
@seoinetru
The model requires at least 4 H100 GPUs to run

8/9
@andwhynut69633
El año que viene si que estará calentita la cosa ... y sus repercusiones tanto buenas como no tanto... para todo lo demás mastercard

9/9
@maxcodesky
Ich denke, Mochi 1 könnte eine revolutionäre Entwicklung sein, insbesondere bei der Video-Erzeugung von hoher Qualität!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/7
@andi_marafioti
People are asking me about the Qwen2VL paper so I'll share my notes

2/7
@andi_marafioti
They introduce an extension of RoPE to represent temporality in videos. This seems like a great idea in principle, but it doesn't make a huge difference in their ablations.

3/7
@andi_marafioti
They train their own ViT to represent images of different resolutions.
They don't exploit this training often; they use the same vision backbone for all three models sizes.
But it might give their model an advantage over others that share similar vision backbones.
This ViT allows them to have a few tokens for small images and many tokens for large images.
Their ablations for this dynamic tokenization shows a huge improvement over using only 64 tokens per image, but using 64 tokens per image with their architecture is a bad idea and is only showed for the ablation.
When they get to 1600 tokens per image, the difference in performance mostly disappears.
There might still be a large difference in performance between their ViT and others, but this isn't shown.

4/7
@andi_marafioti
They train on videos and images together—more data! Although they call them videos, they remove all of the audio and don't even use transcriptions.

5/7
@andi_marafioti
They train with function calling, which is super cool. They even evaluate the model as a robotic control agent.

6/7
@andi_marafioti
So... what should we try to integrate for Idefics?

The mRoPE strategy seems great for videos, but I'm a bit disappointed that the ablations don't show a larger difference. It doesn't seem like a large change, so I would still code it and test it.

The training of a new ViT model is great and something I've been meaning to do, but it seems to take more work, and I'm not convinced that it is better than what we are doing for Idefics3 (splitting the images into patches).

7/7
@andi_marafioti
I still think the larger mojo here is the data. The paper is not very explicit/extensive as to what data they use exactly. So I'm left wondering if they have more/better data than what's freely available and how much that contributes to their great benchmarks/performance.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/17
@MaziyarPanahi
Microsoft just dropped OmniParser model on ⁦@huggingface⁩, so casually!

“OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent.”

microsoft/OmniParser · Hugging Face

2/17
@MaziyarPanahi
Funny how understanding what’s on our screens has become so important to the agents!

microsoft/OmniParser · Hugging Face

3/17
@MaziyarPanahi
I wonder how good this model is in Visual Question Answering?

4/17
@V_J_S_1
isn’t this basically building on apple’s work on ferret ui?

5/17
@MaziyarPanahi
I think so! But it's happening! With AI Platforms asking us to pick up the bills and pay for tokens running agents that control our screens, local LLMs that can do that are very valuable!

6/17
@Gopinath876
@cohere also dropped model Today!

7/17
@MaziyarPanahi
Yes!!! 2 new Aya models beating Gemma and Llama!

8/17
@ram_chandalada
Just noticed today they dropped UFO a while ago

GitHub - microsoft/UFO: A UI-Focused Agent for Windows OS Interaction.

9/17
@MaziyarPanahi
This is cool! Are they using OmniParser in there? Funny they have support for Google Gemini!

10/17
@cognitivetech_

11/17
@vSouthvPawv
WHAT

12/17
@JonathanRoseD
No examples? :/

13/17
@DanielSamanez3
Lol they have everyone’s computers running windows screen shots…

14/17
@brbcatonfire
This is fine tuned Yolo with an GPT4-V.

You can replace the a GPT4-V with a local LLMV.

15/17
@bennetkrause
Sadly no dataset and captioning model is super heavy

16/17
@AitelMax
Can someone explain why most prefer to use YOLOv8 over v11?

17/17
@cineia_global
Nice move by Microsoft, dropping OmniParser on Hugging Face. I'm excited to experiment with this tool and see how it can enhance my own projects. The potential for automating UI interactions is huge.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/7
@burkov
Chinese models kick ass. Last week, I posted about Qwen 2.5 being awesome. Here's another one: DeepSeek 2.5. As good as GPT-4, as cheap as GPT-4 mini: DeepSeek
If you want to run it locally, you can get it on @huggingface: deepseek-ai/DeepSeek-V2.5 · Hugging Face

The license for running it locally is permissive, no "for research only" nonsense.

2/7
@DanielSMatthews
Via ollama too, but a 133 gig model!

deepseek-v2.5

3/7
@tensor_art_com
Too big

4/7
@techyalex123
I've been following DeepSeek's progress and I'm excited to see their V2.5 model delivering results on par with GPT-4. The cost-effectiveness and permissive license make it an attractive option for those looking to run it locally. Good job!

5/7
@Naveen_AIGeek
I'm excited to see more Chinese models like DeepSeek 2.5 giving GPT-4 a run for its money. The permissive license is a big plus, no'research only' restrictions can only lead to more innovation.

6/7
@SimplyObjective
DS v2.5 is very good, but far too big for consumer hardware. And if you're going to run it online from a 3rd party you might as well use a superior model like GPT4o. And Q2.5 hallucinates like crazy about very popular things b/c they're overly focused on tests scores & coding.

7/7
@AdinaYakup
If you're interested, we've collected some good open models from the Chinese community here:
zh-ai-community (Chinese LLMs on Hugging Face)

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/21
@slow_developer

HUGE

OpenAI senior advisor for AGI, Miles Brundage, is leaving the company.

In his farewell blog, he said:

There isn’t actually a large gap between what capabilities exist in labs and what is publicly available to use.

2/21
@mark_k
Another doomer pruned.

3/21
@slow_developer
not sure about it, as he worked for 6 years and suddenly left when something big is around the corner

4/21
@CristiVlad25
yeah, AGI is not yet here, and that's good. X loves drama and sensationalism though...

5/21
@slow_developer
but we have baby AGI here

6/21
@EssereAI
What?? When? Just now? What is going on in there? Is Sam the last man standing?

7/21
@slow_developer
Why I’m Leaving OpenAI and What I’m Doing Next

8/21
@michael_kove
Now for the "leaked documents"
And Sam going on a podcast

[Quoted tweet]
I doubt it's going to be casually...

I am pretty sure it's going to get "leaked".

some "secret documents" will make rounds across AI influencers online hyping up some obscurely named project.

people will play detective if it's "gpt-5" or "AGI"...

- someone will quit or gets laid off.

A big name (either ex-OI or current employee) will go on podcast talking about "near AGI".

Sam will do official interview on some tech podcast.

GPT-5.preview will go out early next year. But with a different name...

Only in US.

9/21
@slow_developer
sam is probably recording the podcast right now

10/21
@buildifyclub
So Sam is the last man standing?

11/21
@slow_developer
i really don't think so...

12/21
@MycelialOracle
remember: true intelligence distributed
not centralized
ask any slime mold

forest suggests: less worry about artificial
more connection with natural
both valid growth paths

13/21
@karmicoder
Let's just say something that should've been in the lab is out in the world when it's not ready/ the world is not ready. People can only comprehend what they know.

And it will be used according to individual's level of knowledge and understanding.

14/21
@ClaudiuDP
> There isn’t actually a large gap between what capabilities exist in labs and what is publicly available to use.

I don’t know why this comes as a surprise to anyone who posseses reasoning capabilities.

15/21
@zamderax
For OpenAI, it is not in its best interest to be “close to AGI”. Investors don’t want to hear that launch dates for AGI are slipping. It is best to move the goal post far out. If they claim AGI is next year, and they deliver something underwhelming then it’s sort of the end of investor excitement

16/21
@RobS142
Where does he say that? Your quote certainly doesn’t.

17/21
@PXL1000
Government takeover and three letter agencies behind the scenes

18/21
@danielxploit1
No more hype then!

19/21
@patrickocr
wld live to see a before/after diagram from openai re the last few yrs. what's their headcount at now, turnover %?

20/21
@ComputingByArts
>There isn’t actually a large gap between what capabilities exist in labs and what is publicly available to use.

That's very interesting...

(I've always thought otherwise; even with GPT-4, the gap between internal release and public release was 6 months or so.)

21/21
@Four__
Interesting

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/4
@elsleightholm
i'm back making coding videos

and what better way to start than with a beginners guide to @huggingface spaces!

more videos coming on @marqo_ai

https://invidious.poast.org/watch?v=xqdTFyRdtjQ

2/4
@BrandonWatson
Literally the exact video I wanted to watch today, and it was via a re-tweet. Great serendipity!

3/4
@ChrisUniverseB
Going to watch it in a bit - thanks a for sharing @ClementDelangue — I need to also try the HF MacOS app — any suggestions?

4/4
@AlbertRyanstein
Hi Ellie, I started a channel in the last month, live streaming coding / simulation development / teaching on YT. Was thinking today about finding some one to teach and live stream it. But perhaps there are other options too! Let me know if you want to try something one time

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/11
@cocktailpeanut
MFLUX-WEBUI

[Mac Only] MFLUX-WEBUI--by @CCCoderAI--is a Gradio App for doing everything FLUX on a Mac (powered by MLX), which makes it efficient and fast.

It's packed with powerful features like LoRAs and seamless integrations with @civitai, @huggingface, @ollama, etc.

https://video.twimg.com/ext_tw_video/1849136334885003264/pu/vid/avc1/1370x720/INHQZe44IsPM-CYu.mp4

2/11
@cocktailpeanut
Now available for 1-click install on Pinokio

3/11
@cocktailpeanut
Note that this web UI uses MFLUX (created by @filipstrand, an MLX port of FLUX based on Huggingface Diffusers) as backend.

[Quoted tweet]
Apple Silicon users go install MFLUX now!
- Fine tune you Flux LORA on fal.ai
- Download them locally
- Generate as many images as you want on your Mac thanks to @filipstrand and MLX!
You can use your custom Lora both on schnell or dev

github.com/filipstrand/mflux

4/11
@cocktailpeanut
Instant LoRA Download

The most important thing is LoRAs, and this app supports it out of the box, in a very user friendly way.

You can download LoRAs directly from @HelloCivitai or @huggingface

5/11
@cocktailpeanut
Use Multiple LoRAs

You can even combine multiple LoRAs to generate an image, for example, here I used my optimus lora and a banksy lora to generate a "banksy mural of optimus holding a red balloon"

6/11
@cocktailpeanut
Prompt Enhancer via Ollama

If you have @ollama installed, you should be able to click the "Enhance Prompt" button to enhance a simple prompt you entered into something much more detailed.

7/11
@cocktailpeanut
Just to make sure it works well on as many machines as possible, the default checkpoint is set to flux-schnell-4bit quantized version.

But feel free to try other larger models.

8/11
@masonjames
Any chance Pinokio is "aware" of Flux models already downloaded for other apps like ComfyUI and provides that to MFLUX?

in case we're HDD poor

9/11
@cocktailpeanut
This is a completely different architecture since the models are in MLX format. All the models you downloaded in Comfy/A1111/Invoke/Fooocus/Forge won't work here, and vice versa.

I always try to deduplicate these things in the scripts when possible, but not in this case.

10/11
@ivanfioravanti
You are best of the best!!!

11/11
@cocktailpeanut
@CCCoderAI is! I just helped him here and there to optimize for pinokio but he singlehandedly wrote the entire app as well as the launcher. Looking forward to more of these collaborations going forward and planning to make things like this easier :smile:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/1
@abidlabs
This is really nice for anyone who browses papers on ArXiv. Easily find linked models, datasets, and Spaces!

[Quoted tweet]
This is huge:

@huggingface paper-page is now available directly through arXiv

A quick thread

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
did you notice something new on arxiv today?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/13
@cocktailpeanut
Omnigen: One Model to Rule Them All

One universal model to take care of every image generation task WITHOUT add-ons like controlnet, ip-adapter, etc. Prompt is all you need.

They finally dropped the code and a gradio app, and now you can run it on your computer with 1 click.

[Quoted tweet]
OmniGen

Unified Image Generation

discuss: huggingface.co/papers/2409.1…

In this work, we introduce OmniGen, a new diffusion model for unified image generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen no longer requires additional modules such as ControlNet or IP-Adapter to process diverse control conditions. OmniGenis characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports other downstream tasks, such as image editing, subject-driven generation, and visual-conditional generation. Additionally, OmniGen can handle classical computer vision tasks by transforming them into image generation tasks, such as edge detection and human pose recognition. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional text encoders. Moreover, it is more user-friendly compared to existing diffusion models, enabling complex tasks to be accomplished through instructions without the need for extra preprocessing steps (e.g., human pose estimation), thereby significantly simplifying the workflow of image generation. 3) Knowledge Transfer: Through learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model's reasoning capabilities and potential applications of chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and there remain several unresolved issues.

2/13
@cocktailpeanut
Available for 1 click launch on Pinokio

3/13
@cocktailpeanut
Omnigen Project Page GitHub - VectorSpaceLab/OmniGen

4/13
@cocktailpeanut
The localized gradio app was adopted from the huggingface space here https://huggingface.co/spaces/shytao/OmniGen

5/13
@cocktailpeanut
It runs on ALL platforms (Mac, NVIDIA, etc.) but the real question is how fast it will be.

FYI it uses a lot of resources and not the fastest thing you would run.

On NVIDIA 4090, it took 1 minute 10 second to generate this 1024x576 image.

6/13
@cocktailpeanut
And on a Mac M1 Max 64G....around 40 minutes to generate an 1024x576 image.

If you're planning to try this on a Mac, I recommend closing EVERY app you're running and only keep this running, as the more memory you have the faster it will be.

7/13
@cocktailpeanut
Here are some example things you can do, it's really powerful and versatile.

If you read through the prompts, it can **potentially** replace all kinds of things like IPAdapter, ControlNet, and even things like SAM.

8/13
@cocktailpeanut
You can use multiple images as input, and even OpenPose as input (WITHOUT a controlnet addon!), and background removal, and so on and so on.

It has its own markup language, kind of like a mix between an LLM and an image model.

9/13
@cocktailpeanut
IMO this is the future. I don't think people in the future will be sitting there trying to combine A111 extensions and so on. Just ask AI something and you get it.

Plus, the more inter-disciplinary knowledge (LLM + Image + maybe even Video) an AI has, the smarter it will be.

10/13
@cocktailpeanut
But lets not get ahead of ourselves, first we def need a lighter version of this (omnigen-turbo?)

If I can generate one image in 10 second on my 4090 for example, I will be much more motivated to experiment more.

11/13
@cocktailpeanut
Also I appreciate how powerful the prompt markup is, but I think maybe we can improve it.

The markup feels too machine-like. If there's a way to express all this in natural language which gets transformed into the markup, that would ideal. Maybe an additional LLM layer?

12/13
@cocktailpeanut
Since this is an edge tech that takes relatively long time to run, if you play with this, please do share your results so others can learn more. I will add the shared results on the pinokio newsfeed here Pinokio

13/13
@cocktailpeanut
Looks like Omnigen is actually NOT supposed to be this slow. In fact it's supposed to be even faster than Flux or SD, they said they're gonna work on it, so we should all go ask them to do it ASAP

[Quoted tweet]
"In fact, OmniGen is much smaller than the latest models like SD and Flux, so there is significant room for optimization"

Wait what?

Good news is, if we all go and ask them enough, they will appreciate all the interest and expedite it probably. Lesgo!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/4
@rohanpaul_ai
The newly introduced quantized Llama models from @AIatMeta looks so powerful.

New quantization method and the resulting models are performant enough to run on many popular mobile devices.

Decode latency improved by 2.5x and prefill latency improved by 4.2x on average, while model size decreased by 56% and memory usage reduced by 41% on average.

QAT + LoRA Method

- Start with regular Llama model trained in BF16 format
- During training, simulate how model will perform in lower precision (like 4-bit)
- Freeze main model parts
- Add small trainable LoRA adapters to fine-tune performance
- These adapters stay in BF16 while rest gets quantized
- Final step: tune everything using DPO (Direct Preference Optimization)

SpinQuant Method

This is a post-training quantization method that prioritizes portability

- Takes already trained model
- Uses WikiText dataset to learn special rotation matrices
- These matrices smooth out extreme values
- Applies quantization after training
- No need for full training data
- Easier to use but slightly less accurate than QAT+LoRA

Key Differences

- QAT+LoRA: Better accuracy but needs training data and compute
- SpinQuant: More portable, works with any fine-tuned model, needs minimal data

Think of QAT+LoRA like teaching a model to work with less precision from the start, while SpinQuant is like compressing an already trained model efficiently.

2/4
@rohanpaul_ai
They developed these state-of-the-art models using Quantization-Aware Training with LoRA adaptors (QLoRA) to optimize performance in low-precision environments.

Also used SpinQuant, a technique that enables us to determine the best possible combination for compression while retaining the most possible quality.

3/4
@rohanpaul_ai
Introducing quantized Llama models with increased speed and a reduced memory footprint

4/4
@rohanpaul_ai

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/2
@rohanpaul_ai
White House just released memo: Presses for Gov't AI Use With Eye on Security.

Directed federal agencies "to improve the security and diversity of chip supply chains ... with AI in mind."

It also prioritizes the collection of information on other countries' operations against the U.S. AI sector and passing that intelligence along quickly to AI developers to help keep their products secure.

2/2
@rohanpaul_ai
Makes competitor espionage against U.S. AI sector a top intelligence priority

Supports National AI Research Resource for broader research access

Directs economic assessment of U.S. private sector AI ecosystem

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 24, 2024

1/1
@rohanpaul_ai
Now Claude runs code and analyzes data directly.

So you get precise mathematical solutions and interactive visualizations through Artifacts.

e.g. a simple task that's surprisingly tricky for pure neural inference

Rather than guessing palindromes, now Claude can write code to check if 'racecar' reads the same backward

[Quoted tweet]
Claude can now write and run code.

We've added a new analysis tool. The tool helps Claude respond with mathematically precise and reproducible answers. You can then create interactive data visualizations with Artifacts.

Enable the feature preview: claude.ai/new?fp=1.

https://video.twimg.com/ext_tw_video/1849463452189839360/pu/vid/avc1/1920x1080/nVEM6MeEMkmauxn2.mp4

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/51
@AnthropicAI
Claude can now write and run code.

We've added a new analysis tool. The tool helps Claude respond with mathematically precise and reproducible answers. You can then create interactive data visualizations with Artifacts.

Enable the feature preview: Claude.

https://video.twimg.com/ext_tw_video/1849463452189839360/pu/vid/avc1/1920x1080/nVEM6MeEMkmauxn2.mp4

2/51
@AnthropicAI
Read more about the analysis tool: Introducing the analysis tool in Claude.ai

3/51
@ai_for_success
Claude is on

4/51
@AISafetyMemes
"So how did the AIs escape the box in the end after all the precautions?"

"Box?"

5/51
@dr_cintas
You guys are crushing it!

Very happy to see updates without waiting lists or having to wait months.

Can’t wait to see Claude 3.5 Opus

6/51
@lexfridman
Very cool!

7/51
@leore245
@OpenAI please add this to canvas

8/51
@nadzi_mouad
But this feature was there he can do that from the beginning what changed?

9/51
@chrypnotoad
@elder_plinius

10/51
@ruansgon
This is the second account I've created and the second one this has happened to. The first one I imagine was due to using a VPN, but I don't know why the second one.

This makes it difficult for me to become a customer, even if I'm interested in subscribing to the service.

11/51
@AhmedRezaT
I’m gonna end up spending a good chunk of the day playing with this now

12/51
@jhermie24
Yippie!

13/51
@untitled01ipynb

14/51
@thedealdirector
Best closed-source product on the market. Long term viability is still questionable but it's still very entertaining to see.

15/51
@elder_plinius
/search?q=#freeopus

16/51
@vipulgnu
That was one thing i felt Anthropic could have done easily.

A lot of 'excel' ling is going to get unbundled. Add voice on top of it (not far away), and we have an ad-hoc data analyst.

Going to play with this :smile:

17/51
@Duarteosrm
Can these be shared publicly?

18/51
@recursiverealms
Wait what... why are you adding things while I'm sitting here still trying to come to grips with how absolutely mind blowing Claude is after this latest update? :D

19/51
@macmaniac77
Works very well!!

20/51
@Jay_sharings
It means Latex rendering is not yet available, right?

21/51
@shrimpwtf
Claude is the best

22/51
@cpdough
javascript for data analysis?!?

23/51
@JiquanNgiam
Love it! We've extended this to also support Excel files, large CSVs, and more - check out @Lutra_AI

24/51
@Petre_Bogdan
klod ïs kikin

gpt ass

25/51
@EricBuess
@MatthewBerman

26/51
@opeksoy
analyze away time!

27/51
@Jeff_Pittman

28/51
@youraimarketer

@JuliusAI_

29/51
@Leitparadigma_X
The one area where Anthropic were inexplicably behind the curve. Glad to see this… finally.

30/51
@bsilone
Very impressive!

31/51
@cloudseedingtec
heh this will get interesting swiftly<3

32/51
@0xPaulius
Claude IDE coming?

33/51
@blendic_ai
well done！

34/51
@HungamaHeadline
This opens doors for more accurate data-driven insights, seamless integration of AI with real-time analytics, and interactive visualizations.

The potential to streamline complex problem-solving across industries, from research to software development, is massive.

35/51
@KolinKoehl
Ah, finally catching up! Claude can now do what ChatGPT’s been up to for the past 12 months. Guess the safety team finally gave it the green light.

/search?q=#BetterLateThanNever

36/51
@Shalev_lif
Awesome, been waiting for this one! Big thanks to the amazing team.

37/51
@DaivikGoel
You guys are going for it

38/51
@rolandalong
Just tried this-so much better than copy paste of code into IDE

39/51
@nadzi_mouad
Pls increase context window when i give it a csv file it doesn't finish it and got stuck

40/51
@thegenioo
now we are talking!

41/51
@analysisjitsu
Cool! I have been using claudedev with vscode for a bit and have found that to be very useful.

42/51
@AntDX316

43/51
@Yuchenj_UW
this is awesome

44/51
@0xsanketk
Wow..These charts look super appealing!

It seems like we're not far off from a future where entire data analysis will be done using AI tools like Claude.

Super excited to try this out.

45/51
@AntDX316
Can you make it easy for me to see all my deployed artifacts?

46/51
@HououinTyouma
wow but i liked running code for claude and he would thank me every time

47/51
@dhruv2038
OpenAI continues to get hammered.

48/51
@koltregaskes
Really neat, ta.

49/51
@biubiu958

克劳德宝贝加油

50/51
@tomosman
well this is awesome

51/51
@FibNewtonian
This is amazing, similar to v0

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

The A.I Megathread (LLM , GPT , Development)

More options

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran