bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187


1/6
Instruct tuned 405B sets SOTA for MMLU-Pro (!) Still seems to be a bit behind 3.5 Sonnet on the other hard evals, but very much in the same ballpark. Can't wait to vibe-check it.

Also notably -- new license (3.1 vs 3 - llama license - Diffchecker) removes the prohibition on using Llama 3 to improve other models.

2/6
License and AUP are officially updated

Llama 3.1 Community License Agreement
Llama 3.1 Acceptable Use Policy

3/6
AUP is identical llama 3.1 AUP - Diffchecker

Official license confirms leaked license - you can use Llama 3.1 outputs to train other models, but they must include the Llama branding llama 3.1 license official - Diffchecker

4/6
can't wait to see llama 3.1 live

5/6
However, they added that any model fine-tuned on LLaMA outputs becomes LLaMA:smile:

6/6
chris bro, sent you an important insight in message


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTIGWcDXkAAIfed.jpg

GTIHLTdWcAARMqW.png

GTIHa6SWMAAr1Qf.png

GTH10NAWQAAgyXi.jpg

GTLhXroW4AAS4Mg.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187



1/8
Meta trained a E2E speech experience with Llama 3.1 - pretty cool! This should equal real-time speech response.

Audio encoder + adapter + LLM = audio in, text out
Custom TTS model uses LLM embeddings to condition output - IMO elegant to stay in latent space and avoid phonemes.

Full audio out (like 4o) does not yield much benefit over this approach.

2/8
Is the model open?

3/8
No. I doubt they will release audio out but there's a reasonable chance for the audio encoder, which gets you most of the way there

4/8
Where is the source of this?

5/8
Section 8 of the paper

6/8
I wonder if this is what they will use for the Ray-Ban voice AI?

7/8
I think you'd combine something like this with the image encoder but yeah. Much much cheaper to serve than full audio out

8/8
Would love to see how good the synthesizer is and see how well it expresses emotion. Just want a TTS that is capable of yelling, crying and have a full emotional range


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTLmn1MXAAAQjz1.jpg

GTLpGoRWwAAsyK9.jpg

GTLrAuCW8AAR3iK.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187


1/11
Kind of wild to think meta is training and releasing open weights models faster than OpenAI can release closed models. Dude is already cooking llama 4

2/11
OpenAI models have not been improving significantly, means meta models will be caught up in open weights

3/11
How would they gate the model tho if they give us the weights

4/11
From the eu? The license etc, otherwise you would be using it at your own risk idk

5/11
Didn’t Meta have more GPU compute available for training than Microsoft? Zuckerberg bet big on GPUs early and continued to invest enormous amounts.

6/11
I love how they are doing it without all the “coming soon” marketing too. Just ship, release, ship.

7/11
Meta had a bajillion GPUs just gathering dust from Zuckerberg's bet that everyone on earth would want to spend all day as a virtual Playmobil figure.

8/11
Fully multimodal was kind of the premise of gpt-4o which didn't really deliver yet. If Meta can somehow act faster and release Llama 4 before the full rollout of 4o with audio, then OpenAI might be truly cooked.

9/11
When you control your own data centers

10/11
That's because they don't care as much about safety because they are not serving the models. OpenAI has to be extra careful so their model don't spit stuff that can make people sue them.

11/11
I think it’s an international choice on OAIs part, not a technical challenge. I wouldn’t be surprised if Google, Anthropic and OAI have models quite a bit better than current SOTA, but just choose not to release atm.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GSuS75UWYAAVALF.png









1/12
From this:
- Llama 4 started training in June
- Llama 4 will be fully multimodal, including audio
- Llama 3 405b will still release in the EU
- Llama 4, and beyond, will not be released in the EU unless they allow META to use EU Facebook and Instagram posts as training data.

2/12
https://www.axios.com/2024/07/17/meta-future-multimodal-ai-models-eu

3/12
Unless the EU changes course here's what won't be coming to Europe:
- Apple Intelligence
- Agent Memory, all agents
- Llama 4, and beyond
I would also guess the next iteration of other frontier models will not release with all their capabilities intact.

4/12
One of the reasons I think they are talking about Llama 4 is Mark Zuckerberg said the only reason they decided to stop training 70b was so they could start training Llama 4. It was still learning.

5/12
From the article it seems the 405B might be text only

6/12
Yes, that appears to be confirmed here.

7/12
idea how long will it take?

8/12
Guesses:
Llama 3 405b - this month, probably even this week
Llama 4 - November or December, the same time I think we get GPT-5

9/12
Hm, if it's open-source, or at least open-weight, couldn't they still get it?

10/12
Nope.

11/12
🤦‍♂️ Any mention of the UK? Although we're no longer subject to directives, I think we still align with most EU policies.

12/12
Yes, they said the UK has issues for them, but not as bad as the EU, and L4 will still be released there.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GSuS75UWYAAVALF.png

GLeBzBRWQAARsPb.png

GSuWfeIWEAApuoS.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187

1/1
The moment we've all been waiting for is HERE!
Introducing the official global launch of Kling AI's International Version1.0!
ANY email address gets you in,no mobile number required!
Direct link:可灵 AI - 新一代 AI 创意生产力平台
Daily login grants 66 free Credits for video creation.
Multiple video creation features available.
Hurry up and sign up now!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTOiWOlb0AAtvJ7.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187

Open Source AI Is the Path Forward​


July 23, 2024

By Mark Zuckerberg, Founder and CEO

In the early days of high-performance computing, the major tech companies of the day each invested heavily in developing their own closed source versions of Unix. It was hard to imagine at the time that any other approach could develop such advanced software. Eventually though, open source Linux gained popularity – initially because it allowed developers to modify its code however they wanted and was more affordable, and over time because it became more advanced, more secure, and had a broader ecosystem supporting more capabilities than any closed Unix. Today, Linux is the industry standard foundation for both cloud computing and the operating systems that run most mobile devices – and we all benefit from superior products because of it.

I believe that AI will develop in a similar way. Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas. Starting next year, we expect future Llama models to become the most advanced in the industry. But even before that, Llama is already leading on openness, modifiability, and cost efficiency.

Today we’re taking the next steps towards open source AI becoming the industry standard. We’re releasing Llama 3.1 405B, the first frontier-level open source AI model, as well as new and improved Llama 3.1 70B and 8B models. In addition to having significantly better cost/performance relative to closed models, the fact that the 405B model is open will make it the best choice for fine-tuning and distilling smaller models.

Beyond releasing these models, we’re working with a range of companies to grow the broader ecosystem. Amazon, Databricks, and NVIDIA are launching full suites of services to support developers fine-tuning and distilling their own models. Innovators like Groq have built low-latency, low-cost inference serving for all the new models. The models will be available on all major clouds including AWS, Azure, Google, Oracle, and more. Companies like Scale.AI, Dell, Deloitte, and others are ready to help enterprises adopt Llama and train custom models with their own data. As the community grows and more companies develop new services, we can collectively make Llama the industry standard and bring the benefits of AI to everyone.

Meta is committed to open source AI. I’ll outline why I believe open source is the best development stack for you, why open sourcing Llama is good for Meta, and why open source AI is good for the world and therefore a platform that will be around for the long term.

Why Open Source AI Is Good for Developers​


When I talk to developers, CEOs, and government officials across the world, I usually hear several themes:

  • We need to train, fine-tune, and distill our own models. Every organization has different needs that are best met with models of different sizes that are trained or fine-tuned with their specific data. On-device tasks and classification tasks require small models, while more complicated tasks require larger models. Now you’ll be able to take the most advanced Llama models, continue training them with your own data and then distill them down to a model of your optimal size – without us or anyone else seeing your data.
  • We need to control our own destiny and not get locked into a closed vendor. Many organizations don’t want to depend on models they cannot run and control themselves. They don’t want closed model providers to be able to change their model, alter their terms of use, or even stop serving them entirely. They also don’t want to get locked into a single cloud that has exclusive rights to a model. Open source enables a broad ecosystem of companies with compatible toolchains that you can move between easily.
  • We need to protect our data. Many organizations handle sensitive data that they need to secure and can’t send to closed models over cloud APIs. Other organizations simply don’t trust the closed model providers with their data. Open source addresses these issues by enabling you to run the models wherever you want. It is well-accepted that open source software tends to be more secure because it is developed more transparently.
  • We need a model that is efficient and affordable to run. Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o, for both user-facing and offline inference tasks.
  • We want to invest in the ecosystem that’s going to be the standard for the long term. Lots of people see that open source is advancing at a faster rate than closed models, and they want to build their systems on the architecture that will give them the greatest advantage long term.

Why Open Source AI Is Good for Meta​


Meta’s business model is about building the best experiences and services for people. To do this, we must ensure that we always have access to the best technology, and that we’re not locking into a competitor’s closed ecosystem where they can restrict what we build.

One of my formative experiences has been building our services constrained by what Apple will let us build on their platforms. Between the way they tax developers, the arbitrary rules they apply, and all the product innovations they block from shipping, it’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build. On a philosophical level, this is a major reason why I believe so strongly in building open ecosystems in AI and AR/VR for the next generation of computing.

People often ask if I’m worried about giving up a technical advantage by open sourcing Llama, but I think this misses the big picture for a few reasons:

First, to ensure that we have access to the best technology and aren’t locked into a closed ecosystem over the long term, Llama needs to develop into a full ecosystem of tools, efficiency improvements, silicon optimizations, and other integrations. If we were the only company using Llama, this ecosystem wouldn’t develop and we’d fare no better than the closed variants of Unix.

Second, I expect AI development will continue to be very competitive, which means that open sourcing any given model isn’t giving away a massive advantage over the next best models at that point in time. The path for Llama to become the industry standard is by being consistently competitive, efficient, and open generation after generation.

Third, a key difference between Meta and closed model providers is that selling access to AI models isn’t our business model. That means openly releasing Llama doesn’t undercut our revenue, sustainability, or ability to invest in research like it does for closed providers. (This is one reason several closed providers consistently lobby governments against open source.)

Finally, Meta has a long history of open source projects and successes. We’ve saved billions of dollars by releasing our server, network, and data center designs with Open Compute Project and having supply chains standardize on our designs. We benefited from the ecosystem’s innovations by open sourcing leading tools like PyTorch, React, and many more tools. This approach has consistently worked for us when we stick with it over the long term.You can access the models now at llama.meta.com.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187

Why Open Source AI Is Good for the World​


I believe that open source is necessary for a positive AI future. AI has more potential than any other modern technology to increase human productivity, creativity, and quality of life – and to accelerate economic growth while unlocking progress in medical and scientific research. Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn’t concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society.

There is an ongoing debate about the safety of open source AI models, and my view is that open source AI will be safer than the alternatives. I think governments will conclude it’s in their interest to support open source because it will make the world more prosperous and safer.

My framework for understanding safety is that we need to protect against two categories of harm: unintentional and intentional. Unintentional harm is when an AI system may cause harm even when it was not the intent of those running it to do so. For example, modern AI models may inadvertently give bad health advice. Or, in more futuristic scenarios, some worry that models may unintentionally self-replicate or hyper-optimize goals to the detriment of humanity. Intentional harm is when a bad actor uses an AI model with the goal of causing harm.

It’s worth noting that unintentional harm covers the majority of concerns people have around AI – ranging from what influence AI systems will have on the billions of people who will use them to most of the truly catastrophic science fiction scenarios for humanity. On this front, open source should be significantly safer since the systems are more transparent and can be widely scrutinized. Historically, open source software has been more secure for this reason. Similarly, using Llama with its safety systems like Llama Guard will likely be safer and more secure than closed models. For this reason, most conversations around open source AI safety focus on intentional harm.

Our safety process includes rigorous testing and red-teaming to assess whether our models are capable of meaningful harm, with the goal of mitigating risks before release. Since the models are open, anyone is capable of testing for themselves as well. We must keep in mind that these models are trained by information that’s already on the internet, so the starting point when considering harm should be whether a model can facilitate more harm than information that can quickly be retrieved from Google or other search results.

When reasoning about intentional harm, it’s helpful to distinguish between what individual or small scale actors may be able to do as opposed to what large scale actors like nation states with vast resources may be able to do.

At some point in the future, individual bad actors may be able to use the intelligence of AI models to fabricate entirely new harms from the information available on the internet. At this point, the balance of power will be critical to AI safety. I think it will be better to live in a world where AI is widely deployed so that larger actors can check the power of smaller bad actors. This is how we’ve managed security on our social networks – our more robust AI systems identify and stop threats from less sophisticated actors who often use smaller scale AI systems. More broadly, larger institutions deploying AI at scale will promote security and stability across society. As long as everyone has access to similar generations of models – which open source promotes – then governments and institutions with more compute resources will be able to check bad actors with less compute.

The next question is how the US and democratic nations should handle the threat of states with massive resources like China. The United States’ advantage is decentralized and open innovation. Some people argue that we must close our models to prevent China from gaining access to them, but my view is that this will not work and will only disadvantage the US and its allies. Our adversaries are great at espionage, stealing models that fit on a thumb drive is relatively easy, and most tech companies are far from operating in a way that would make this more difficult. It seems most likely that a world of only closed models results in a small number of big companies plus our geopolitical adversaries having access to leading models, while startups, universities, and small businesses miss out on opportunities. Plus, constraining American innovation to closed development increases the chance that we don’t lead at all. Instead, I think our best strategy is to build a robust open ecosystem and have our leading companies work closely with our government and allies to ensure they can best take advantage of the latest advances and achieve a sustainable first-mover advantage over the long term.

When you consider the opportunities ahead, remember that most of today’s leading tech companies and scientific research are built on open source software. The next generation of companies and research will use open source AI if we collectively invest in it. That includes startups just getting off the ground as well as people in universities and countries that may not have the resources to develop their own state-of-the-art AI from scratch.

The bottom line is that open source AI represents the world’s best shot at harnessing this technology to create the greatest economic opportunity and security for everyone.

Let’s Build This Together​


With past Llama models, Meta developed them for ourselves and then released them, but didn’t focus much on building a broader ecosystem. We’re taking a different approach with this release. We’re building teams internally to enable as many developers and partners as possible to use Llama, and we’re actively building partnerships so that more companies in the ecosystem can offer unique functionality to their customers as well.

I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here. I hope you’ll join us on this journey to bring the benefits of AI to everyone in the world.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187



1/4
Meta /search?q=#META released Llama 3.1 yesterday, including their first mega 405B param model (previously was only 7/8B and 70B versions in Llama 2-3) and expanded 128k context window. Looks to be on par with Open AI's GPT-4o and Anthropic's Claude 3.5 Sonnet.

2/4
Llama 3.1 is already available in AI platforms from partners /search?q=#AMZN AWS, NVIDIA /search?q=#NVDA, Databricks, Microsoft /search?q=#MSFT Azure, /search?q=#GOOG Google Cloud, Snowflake /search?q=#SNOW, Dell /search?q=#DELL, Scale AI, Groq, IBM /search?q=#IBM, as well as Cloudflare /search?q=#NET Workers AI

3/4
How the big Llama 3.1 405B model compares to GPT-4, GPT-4o, and Claude 3.5 Sonnet

4/4
Read more on Llama 3.1 here: Introducing Llama 3.1: Our most capable models to date


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTRKNs6aQAAKnX3.jpg

GTRJ1omaYAETfmO.jpg








1/11
Meta Llama 3.1 405B, 70B & 8B are here - Multilingual & with 128K context & Tool-use + agents! Competitive/ beats GPT4o & Claude Sonnet 3.5 unequivocally the best open LLM out there!🐐

Bonus: It comes with a more permissive license, which allows one to train other LLMs on its high-quality outputs 🔥

Some important facts:

> Multilingual - English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai.
> MMLU - 405B (85.2), 70B (79.3) & 8B (66.7)
> Trained on 15 Trillion tokens + 25M synthetically generated outputs.
> Pre-training cut-off date of December 2023
> Same architecture as Llama 3 with GQA
> Used a massive 39.3 Million GPU hours (16K H100s for 405B)
> 128K context ⚡
> Excels at Code output tasks, too!
> Release Prompt Guard - BERT-based classifier to detect jailbreaks, malicious code, etc
> Llama Guard 8B w/ 128K context for securing prompts across a series of topics

How much GPU VRAM do you need to run these?

405B - 810 GB in fp/bf16, 405 GB in fp8/ int8, 203 GB in int4
70B - 140 GB in fp/bf16, 70 GB in fp8/ int8, 35 GB in int4
8B - 16 GB in fp/bf16, 8 GB in fp8/ int8 & 4 GB in int4

In addition, we provide a series of Quants ready to deploy: AWQ, Bitsandbytes, and GPTQ. These allow you to run 405B in as little as 4 x A100 (80GB) through TGI or VLLM. 🚀

Wait, it improves; we also provide unlimited access to HF Pro users via our deployed Inference Endpoint!

Want to learn more? We wrote a detailed blog post on it 🦙

Kudos to @AIatMeta for believing in open source and science! It has been fun collaborating! 🤗

2/11
Model checkpoints:

Llama 3.1 - a meta-llama Collection

3/11
Detailed blog post:

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

4/11
ICYMI: You can directly use it in Hugging Chat!

5/11
Anyone offered up hosted endpoints yet?

6/11
You can directly use it in Hugging Chat ofc!

meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 - HuggingChat

7/11
Amazing! ❤️🎆🎉

Are good 4bit quants already out?

Look forward to trying Llama3.1-405B locally on 4x @AMD /search?q=#W7900 GPUs (48GiB, ~$3400 on Amazon today.) cf.

8/11
Yes! Check it out here:

hugging-quants (Hugging Quants)

9/11
1/2/3/4TB of unified memory architecture on the consumer side is a MUST; AMD and Intel are killing the software innovation and hope ARM and Nvidia offer the solution.

10/11
Our inference is ready
Llama 3.1 405B Instruct | ModelBox

11/11



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTLlEiUWgAA-G1K.jpg

GTIXOH9agAA3mFW.jpg

GTIXORGacAA13XV.jpg



1/1
OK BUT WHY DOES LLAMA 3.1-8B GO SO HARD

Definitely beats GPT-3.5 easy and might be better than 4o…

EVERYONE TRY IT RN BEFORE ZUCK CHANGES HIS MIND

p.s. I recommend using it with @ollama

llama3.1

meta-llama/Meta-Llama-3.1-8B-Instruct · Hugging Face


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187




1/5
1/ Just finished evaluating some AI-generated explanations on the concept of time dilation near black holes! The results were fascinating, and guess what? Llama 3.1:8b came out on top!

2/5
2/ In our analysis, we scored each explanation based on clarity, understandability, use of analogies, engagement, and accuracy. Here are the results! 👇

3/5
3/ Llama 3.1:8b scored 24/25 with a clear river and sinkhole analogy.

GPT-4o scored 22/25 with a clock and river analogy.

Claude 3.5 Sonnet got 20/25 using a river and molasses analogy.

GPT-4o Mini scored 18/25 with a dance floor analogy.

Llama 3.1:70b scored 16/25 with a sink analogy.

4/5
4/ This shows that sometimes smaller models can pack a punch.

5/5
5/ Used @ollama and @msty_app - awesome tools!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
Meta Llama 3.1 405B, 70B & 8B are here - Multilingual & with 128K context & Tool-use + agents! Competitive/ beats GPT4o & Claude Sonnet 3.5 unequivocally the best open LLM out there!🤗


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTQyWGdW0AAgRAg.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187

1/2
Try @AIatMeta's Llama-3.1 with in-browser local inference at WebLLM Chat, with /search?q=#WebLLM accelerated by @WebGPU!

Impressive capabilities (e.g. coding, multilingual)--great for building in-browser agents!

4bit-quantized Llama-3.1 8B giving perfect answers in real-time:

2/2
s/o to @NestorQin for the great work on WebLLMChat!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/11
Full benchmarks for LLaMA-3.1 (both instruct and base):

2/11
3.1 70b looking like a sweet spot

3/11
👀

4/11
True GPT-4 at home

5/11
the numbers are insane …

6/11
איפה ניתן להשתמש במודל החדש איפה זה זמין ת

7/11
These numbers are insane. MMLU-Pro = 73.3 is the highest I have ever seen.

8/11
Source

9/11
Hey! I am Looking for the top 10 papers on large model training with GPU clusters. Any recommendations?

10/11
@extractum_io

11/11
wow, so that instruction tuning sure packs a punch!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTIJfKiWIAAnhlp.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187

1/1
[CV] Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
[2407.15811] Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
- The computational cost of training large-scale diffusion models for text-to-image generation remains prohibitively high. This paper aims to develop a low-cost end-to-end pipeline to train competitive text-to-image diffusion models with more than an order of magnitude reduction in cost.

- Vision transformer-based latent diffusion models are considered as they have been widely adopted in large-scale diffusion models. To reduce cost, the paper exploits the dependence of transformers' cost on input sequence size by randomly masking patches. However, high masking ratios degrade performance.

- A deferred masking strategy is proposed where all patches are preprocessed by a lightweight patch-mixer before masking. This allows non-masked patches to retain information about the whole image, enabling reliable training at high masking ratios with no extra cost.

- Recent transformer architecture advances like layer-wise scaling and mixture-of-experts layers are incorporated to further improve performance under constraints.

- With only $1,890 cost on one 8xH100 GPU machine, a 1.16 billion parameter sparse diffusion transformer is trained on 37M images with 75% masking that achieves competitive generations while requiring 14x lower cost than current state-of-the-art.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTNFjW-aYAMWE7X.jpg

GTNFjdCaYAAfihi.jpg

GTNFjzNaYAIVkvL.jpg


Computer Science > Computer Vision and Pattern Recognition​


[Submitted on 22 Jul 2024]

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget​


Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu

As scaling laws in generative AI push performance, they also simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to address this bottleneck by demonstrating very low-cost training of large-scale T2I diffusion transformer models. As the computational cost of transformers increases with the number of patches in each image, we propose to randomly mask up to 75% of the image patches during training. We propose a deferred masking strategy that preprocesses all patches using a patch-mixer before masking, thus significantly reducing the performance degradation with masking, making it superior to model downscaling in reducing computational cost. We also incorporate the latest improvements in transformer architecture, such as the use of mixture-of-experts layers, to improve performance and further identify the critical benefit of using synthetic images in micro-budget training. Finally, using only 37M publicly available real and synthetic images, we train a 1.16 billion parameter sparse transformer with only $1,890 economical cost and achieve a 12.7 FID in zero-shot generation on the COCO dataset. Notably, our model achieves competitive FID and high-quality generations while incurring 118× lower cost than stable diffusion models and 14× lower cost than the current state-of-the-art approach that costs $28,400. We aim to release our end-to-end training pipeline to further democratize the training of large-scale diffusion models on micro-budgets.


Comments:41 pages, 28 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:arXiv:2407.15811 [cs.CV]
(or arXiv:2407.15811v1 [cs.CV] for this version)
[2407.15811] Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Submission history​

From: Vikash Sehwag [view email]
[v1] Mon, 22 Jul 2024 17:23:28 UTC (28,658 KB)
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187

1/11
🚀 Witnessing the evolution of a Stable Diffusion model through training is truly fascinating! From the abstract, low-quality beginnings of Image 1 🖼️ to the futuristic vibe of Image 2 🌆, it's all about loving the process. Stay tuned, as we're cooking up an incredible image hub where images can be easily generated and transformed with AI! 🔥 @SkillfulAI /search?q=#AIArt /search?q=#NFTCommunity /search?q=#StableDiffusion

2/11
insane

3/11
/search?q=#SKAI is cooking hard !

4/11
So cool!

5/11
Great to see beautiful results enabled through @SkillfulAI 🧙‍♂️

6/11
Lets go !!!

7/11
Amazing!

8/11
🙌🏻

9/11
Thats cool 👁️👁️

10/11
Totally tuned

11/11
So stoked about @SkillfulAI image generator! Cant wait


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GRl5OJVbMAEbB7t.jpg

GRl5PuUaUAEPYHF.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187










1/11
🚀Our latest blog post unveils the power of Consistency Models and introduces Easy Consistency Tuning (ECT), a new way to fine-tune pretrained diffusion models to consistency models.

SoTA fast generative models using 1/32 training cost! 🔽
Get ready to speed up your generative models! 🌟

Blog: Consistency Models Made Easy | Notion
Code: GitHub - locuslab/ect: Consistency Models Made Easy

2/11
Regarding diffusion models, ECT only requires 1/1000 of its inference cost to achieve the same sample quality.

Considering distillation approaches, ECT significantly outperforms consistency distillation (CD) and score distillation. No teacher required!

Compared to consistency training from scratch, ECT rivals iCT-deep using 1/32 of the training compute and 1/2 of the model size.

3/11
Diffusion Models/Score SDE, even Flows, are slow and expensive!

For example, # Sora is amazing as a video diffusion model, but deploying it will be a huge challenge.
“Sora can at most generate about 5 minutes of video per hour per Nvidia H100 GPU,” per estimation. 🫠

Cost estimations suggest that it could be multiple orders of magnitude more expensive than Large Language Models (LLMs). 💸

4/11
Diffusion distillation, GANs, and consistency models could be alternatives for fast inference.
Among them, consistency models by @DrYangSong are an elegant new family of generative models.

5/11
Consistency Models are amazing but not easy to train…
See a comparison of Generative Modeling 🤔.

6/11
Our analysis reveals the difficulties of training Consistency Models, revealing diffusion models as a special case of consistency models when taking on a loose consistency condition.

Therefore, we can “interpolate” from diffusion models Dt = t to consistency models Dt = dt by shrinking Dt to 0, using a continuous-time formulation.

7/11
The image quality from 1-step sampling gradually improves when transitioning from diffusion models to consistency models.

8/11
Instead of using Diffusion Distillation or training Consistency Models from scratch, fine-tune your pretrained diffusion models to consistency models! 💫

Here's your handy cheat sheet:

9/11
📚 Hungry for more knowledge on Diffusion Acceleration? Don't miss out on @sedielem's blog for a holistic view of diffusion distillation! Another 1-hour tour that will leave you inspired and equipped with even more insights! :D

The paradox of diffusion distillation

10/11
Together with William, @ashwini1024, and @zicokolter!

Consistency Models Made Easy | Notion

11/11
@William74312006


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GJsz_22XUAIp1FE.jpg

GJs2f0UW8AAufa_.jpg

GJs4KnvWcAAr5p3.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187










1/11
📢Presenting 𝐃𝐄𝐏𝐈𝐂𝐓: Diffusion-Enabled Permutation Importance for Image Classification Tasks /search?q=#ECCV2024

We use permutation importance to compute dataset-level explanations for image classifiers using diffusion models (without access to model parameters or training data!)

2/11
🌐Project Page: DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks
📝 Paper: [2407.14509] DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks
💻Github: GitHub - MLD3/DEPICT
✍️ @gregkondas (𝐚𝐩𝐩𝐥𝐲𝐢𝐧𝐠 𝐭𝐨 𝐏𝐡𝐃𝐬!), @msjoding @ellakaz David Fouhey and Jenna Wiens

Key Takeaways🧵

3/11
𝐄𝐱𝐩𝐥𝐚𝐢𝐧𝐢𝐧𝐠 𝐢𝐦𝐚𝐠𝐞 𝐜𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐞𝐫𝐬 𝐢𝐬 𝐨𝐟𝐭𝐞𝐧 𝐢𝐧𝐬𝐭𝐚𝐧𝐜𝐞-𝐥𝐞𝐯𝐞𝐥 in the 𝐩𝐢𝐱𝐞𝐥 𝐬𝐩𝐚𝐜𝐞 (e.g. heatmaps) but it is difficult to understand general model behavior from a set of heatmaps, and end-users often struggle to interpret them.

4/11
DEPICT takes inspiration from how we sometimes explain 𝐭𝐚𝐛𝐮𝐥𝐚𝐫 𝐝𝐚𝐭𝐚 classifiers using 𝐩𝐞𝐫𝐦𝐮𝐭𝐚𝐭𝐢𝐨𝐧 𝐢𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐜𝐞: a method to generate a dataset-level explanation that ranks all input features according to their impact on downstream model performance.

5/11
Each (inherently interpretable) tabular feature is permuted across instances, and feature importance is measured by the change in model performance relative to unpermuted data. Features are then ranked by their feature importance. We asked: can we apply this to image classifiers?

6/11
𝐊𝐞𝐲 𝐈𝐧𝐬𝐢𝐠𝐡𝐭: If we want to know the extent that an image classifier relies on a specific concept (e.g. the presence of a chair), we don't know which pixels we should permute across image instances. BUT, turns out we can do this with text-conditioned diffusion models!

7/11
Given a dataset of images labeled with concepts (e.g., chair present), we 𝐩𝐞𝐫𝐦𝐮𝐭𝐞 𝐚 𝐜𝐨𝐧𝐜𝐞𝐩𝐭 across examples 𝐢𝐧 𝐭𝐞𝐱𝐭 𝐬𝐩𝐚𝐜𝐞 and then 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞 a permuted version of the dataset via a 𝐭𝐞𝐱𝐭-𝐜𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧𝐞𝐝 𝐝𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥.

8/11
Feature importance is then reflected by the change in model performance on 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞𝐝 𝐝𝐚𝐭𝐚 with a 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜 𝐜𝐨𝐧𝐜𝐞𝐩𝐭 𝐩𝐞𝐫𝐦𝐮𝐭𝐞𝐝 relative to the generated data with no concepts permuted.

9/11
When applied to a 𝐬𝐞𝐭 𝐨𝐟 𝐜𝐨𝐧𝐜𝐞𝐩𝐭𝐬, 𝐃𝐄𝐏𝐈𝐂𝐓 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞𝐬 𝐚 𝐫𝐚𝐧𝐤𝐢𝐧𝐠 𝐨𝐟 𝐟𝐞𝐚𝐭𝐮𝐫𝐞 𝐢𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐜𝐞. We show this approach recovers underlying model feature importance on synthetic and real-world (COCO, MIMIC-CXR) image classification tasks.

10/11
DEPICT doesn't come without 𝐥𝐢𝐦𝐢𝐭𝐚𝐭𝐢𝐨𝐧: You have to be very 𝐜𝐚𝐫𝐞𝐟𝐮𝐥 to correctly generate permuted data! We discuss assumptions that need to be met to apply DEPICT in the paper. But, as diffusion models get better, we hope the applicability of DEPICT will too!

11/11
Please feel free to reach out if you have any questions, and I'd love to chat if you'll be in Milan!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTFq9FcXUAAIzke.jpg

GTGhI1rXwAAXFdX.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,691
Reputation
8,224
Daps
157,187

1/1
🚨Artist: Aesthetically Controllable Text-Driven Stylization without Training
🌟𝐏𝐫𝐨𝐣: Artist: Aesthetically Controllable Text-Driven Stylization without Training
🚀𝐀𝐛𝐬: [2407.15842] Artist: Aesthetically Controllable Text-Driven Stylization without Training

a training-free approach that aesthetically controls the content and style generation of a pretrained diffusion model for text-driven stylization.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTSJrNhaYAIqRvU.jpg
 
Top