Why Open Source AI Will Win - You shouldn't bet against the bazaar or the GPU p̶o̶o̶r̶ hungry.

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,855
Reputation
9,338
Daps
169,874

Open Source AI Is the Path Forward​


July 23, 2024

By Mark Zuckerberg, Founder and CEO

In the early days of high-performance computing, the major tech companies of the day each invested heavily in developing their own closed source versions of Unix. It was hard to imagine at the time that any other approach could develop such advanced software. Eventually though, open source Linux gained popularity – initially because it allowed developers to modify its code however they wanted and was more affordable, and over time because it became more advanced, more secure, and had a broader ecosystem supporting more capabilities than any closed Unix. Today, Linux is the industry standard foundation for both cloud computing and the operating systems that run most mobile devices – and we all benefit from superior products because of it.

I believe that AI will develop in a similar way. Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas. Starting next year, we expect future Llama models to become the most advanced in the industry. But even before that, Llama is already leading on openness, modifiability, and cost efficiency.

Today we’re taking the next steps towards open source AI becoming the industry standard. We’re releasing Llama 3.1 405B, the first frontier-level open source AI model, as well as new and improved Llama 3.1 70B and 8B models. In addition to having significantly better cost/performance relative to closed models, the fact that the 405B model is open will make it the best choice for fine-tuning and distilling smaller models.

Beyond releasing these models, we’re working with a range of companies to grow the broader ecosystem. Amazon, Databricks, and NVIDIA are launching full suites of services to support developers fine-tuning and distilling their own models. Innovators like Groq have built low-latency, low-cost inference serving for all the new models. The models will be available on all major clouds including AWS, Azure, Google, Oracle, and more. Companies like Scale.AI, Dell, Deloitte, and others are ready to help enterprises adopt Llama and train custom models with their own data. As the community grows and more companies develop new services, we can collectively make Llama the industry standard and bring the benefits of AI to everyone.

Meta is committed to open source AI. I’ll outline why I believe open source is the best development stack for you, why open sourcing Llama is good for Meta, and why open source AI is good for the world and therefore a platform that will be around for the long term.

Why Open Source AI Is Good for Developers​


When I talk to developers, CEOs, and government officials across the world, I usually hear several themes:

  • We need to train, fine-tune, and distill our own models. Every organization has different needs that are best met with models of different sizes that are trained or fine-tuned with their specific data. On-device tasks and classification tasks require small models, while more complicated tasks require larger models. Now you’ll be able to take the most advanced Llama models, continue training them with your own data and then distill them down to a model of your optimal size – without us or anyone else seeing your data.
  • We need to control our own destiny and not get locked into a closed vendor. Many organizations don’t want to depend on models they cannot run and control themselves. They don’t want closed model providers to be able to change their model, alter their terms of use, or even stop serving them entirely. They also don’t want to get locked into a single cloud that has exclusive rights to a model. Open source enables a broad ecosystem of companies with compatible toolchains that you can move between easily.
  • We need to protect our data. Many organizations handle sensitive data that they need to secure and can’t send to closed models over cloud APIs. Other organizations simply don’t trust the closed model providers with their data. Open source addresses these issues by enabling you to run the models wherever you want. It is well-accepted that open source software tends to be more secure because it is developed more transparently.
  • We need a model that is efficient and affordable to run. Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o, for both user-facing and offline inference tasks.
  • We want to invest in the ecosystem that’s going to be the standard for the long term. Lots of people see that open source is advancing at a faster rate than closed models, and they want to build their systems on the architecture that will give them the greatest advantage long term.


Why Open Source AI Is Good for Meta​


Meta’s business model is about building the best experiences and services for people. To do this, we must ensure that we always have access to the best technology, and that we’re not locking into a competitor’s closed ecosystem where they can restrict what we build.

One of my formative experiences has been building our services constrained by what Apple will let us build on their platforms. Between the way they tax developers, the arbitrary rules they apply, and all the product innovations they block from shipping, it’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build. On a philosophical level, this is a major reason why I believe so strongly in building open ecosystems in AI and AR/VR for the next generation of computing.

People often ask if I’m worried about giving up a technical advantage by open sourcing Llama, but I think this misses the big picture for a few reasons:

First, to ensure that we have access to the best technology and aren’t locked into a closed ecosystem over the long term, Llama needs to develop into a full ecosystem of tools, efficiency improvements, silicon optimizations, and other integrations. If we were the only company using Llama, this ecosystem wouldn’t develop and we’d fare no better than the closed variants of Unix.

Second, I expect AI development will continue to be very competitive, which means that open sourcing any given model isn’t giving away a massive advantage over the next best models at that point in time. The path for Llama to become the industry standard is by being consistently competitive, efficient, and open generation after generation.

Third, a key difference between Meta and closed model providers is that selling access to AI models isn’t our business model. That means openly releasing Llama doesn’t undercut our revenue, sustainability, or ability to invest in research like it does for closed providers. (This is one reason several closed providers consistently lobby governments against open source.)

Finally, Meta has a long history of open source projects and successes. We’ve saved billions of dollars by releasing our server, network, and data center designs with Open Compute Project and having supply chains standardize on our designs. We benefited from the ecosystem’s innovations by open sourcing leading tools like PyTorch, React, and many more tools. This approach has consistently worked for us when we stick with it over the long term.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,855
Reputation
9,338
Daps
169,874

Why Open Source AI Is Good for the World​


I believe that open source is necessary for a positive AI future. AI has more potential than any other modern technology to increase human productivity, creativity, and quality of life – and to accelerate economic growth while unlocking progress in medical and scientific research. Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn’t concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society.

There is an ongoing debate about the safety of open source AI models, and my view is that open source AI will be safer than the alternatives. I think governments will conclude it’s in their interest to support open source because it will make the world more prosperous and safer.

My framework for understanding safety is that we need to protect against two categories of harm: unintentional and intentional. Unintentional harm is when an AI system may cause harm even when it was not the intent of those running it to do so. For example, modern AI models may inadvertently give bad health advice. Or, in more futuristic scenarios, some worry that models may unintentionally self-replicate or hyper-optimize goals to the detriment of humanity. Intentional harm is when a bad actor uses an AI model with the goal of causing harm.

It’s worth noting that unintentional harm covers the majority of concerns people have around AI – ranging from what influence AI systems will have on the billions of people who will use them to most of the truly catastrophic science fiction scenarios for humanity. On this front, open source should be significantly safer since the systems are more transparent and can be widely scrutinized. Historically, open source software has been more secure for this reason. Similarly, using Llama with its safety systems like Llama Guard will likely be safer and more secure than closed models. For this reason, most conversations around open source AI safety focus on intentional harm.

Our safety process includes rigorous testing and red-teaming to assess whether our models are capable of meaningful harm, with the goal of mitigating risks before release. Since the models are open, anyone is capable of testing for themselves as well. We must keep in mind that these models are trained by information that’s already on the internet, so the starting point when considering harm should be whether a model can facilitate more harm than information that can quickly be retrieved from Google or other search results.

When reasoning about intentional harm, it’s helpful to distinguish between what individual or small scale actors may be able to do as opposed to what large scale actors like nation states with vast resources may be able to do.

At some point in the future, individual bad actors may be able to use the intelligence of AI models to fabricate entirely new harms from the information available on the internet. At this point, the balance of power will be critical to AI safety. I think it will be better to live in a world where AI is widely deployed so that larger actors can check the power of smaller bad actors. This is how we’ve managed security on our social networks – our more robust AI systems identify and stop threats from less sophisticated actors who often use smaller scale AI systems. More broadly, larger institutions deploying AI at scale will promote security and stability across society. As long as everyone has access to similar generations of models – which open source promotes – then governments and institutions with more compute resources will be able to check bad actors with less compute.

The next question is how the US and democratic nations should handle the threat of states with massive resources like China. The United States’ advantage is decentralized and open innovation. Some people argue that we must close our models to prevent China from gaining access to them, but my view is that this will not work and will only disadvantage the US and its allies. Our adversaries are great at espionage, stealing models that fit on a thumb drive is relatively easy, and most tech companies are far from operating in a way that would make this more difficult. It seems most likely that a world of only closed models results in a small number of big companies plus our geopolitical adversaries having access to leading models, while startups, universities, and small businesses miss out on opportunities. Plus, constraining American innovation to closed development increases the chance that we don’t lead at all. Instead, I think our best strategy is to build a robust open ecosystem and have our leading companies work closely with our government and allies to ensure they can best take advantage of the latest advances and achieve a sustainable first-mover advantage over the long term.

When you consider the opportunities ahead, remember that most of today’s leading tech companies and scientific research are built on open source software. The next generation of companies and research will use open source AI if we collectively invest in it. That includes startups just getting off the ground as well as people in universities and countries that may not have the resources to develop their own state-of-the-art AI from scratch.

The bottom line is that open source AI represents the world’s best shot at harnessing this technology to create the greatest economic opportunity and security for everyone.

Let’s Build This Together​


With past Llama models, Meta developed them for ourselves and then released them, but didn’t focus much on building a broader ecosystem. We’re taking a different approach with this release. We’re building teams internally to enable as many developers and partners as possible to use Llama, and we’re actively building partnerships so that more companies in the ecosystem can offer unique functionality to their customers as well.

I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here. I hope you’ll join us on this journey to bring the benefits of AI to everyone in the world.

You can access the models now at llama.meta.com.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,855
Reputation
9,338
Daps
169,874




GWuM-L0WsAAp-VN.jpg









1/10
I'm excited to announce Reflection 70B, the world’s top open-source model.

Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.

405B coming next week - we expect it to be the best model in the world.

Built w/ @GlaiveAI.

Read on ⬇️:

2/10
Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o).

It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K.

Beats GPT-4o on every benchmark tested.

It clobbers Llama 3.1 405B. It’s not even close.

3/10
The technique that drives Reflection 70B is simple, but very powerful.

Current LLMs have a tendency to hallucinate, and can’t recognize when they do so.

Reflection-Tuning enables LLMs to recognize their mistakes, and then correct them before committing to an answer.

4/10
Additionally, we separate planning into a separate step, improving CoT potency and keeping the outputs simple and concise for end users.

5/10
Important to note: We have checked for decontamination against all benchmarks mentioned using @lmsysorg's LLM Decontaminator.

6/10
The weights of our 70B model are available today on @huggingface here: mattshumer/Reflection-Llama-3.1-70B · Hugging Face

@hyperbolic_labs API available later today.

Next week, we will release the weights of Reflection-405B, along with a short report going into more detail on our process and findings.

7/10
Most importantly, a huge shoutout to @csahil28 and @GlaiveAI.

I’ve been noodling on this idea for months, and finally decided to pull the trigger a few weeks ago. I reached out to Sahil and the data was generated within hours.

If you’re training models, check Glaive out.

8/10
This model is quite fun to use and insanely powerful.

Please check it out — with the right prompting, it’s an absolute beast for many use-cases.

Demo here: Reflection 70B Playground

9/10
405B is coming next week, and we expect it to outperform Sonnet and GPT-4o by a wide margin.

But this is just the start. I have a few more tricks up my sleeve.

I’ll continue to work with @csahil28 to release even better LLMs that make this one look like a toy.

Stay tuned.

10/10
We'll release a report next week!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWuM-L0WsAAp-VN.jpg

GWuM7xAWoAEIn61.jpg

GWq6-r5XAAAp7k7.jpg

GWq8Av7X0AAO-v3.jpg
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,855
Reputation
9,338
Daps
169,874

1/1
@nexusfusion_io
$450 open-source reasoning model Sky-T1-32B-Preview from UC Berkeley's NovaSky team rivals OpenAI's o1. With 19 hours of training, this marks a major milestone in cost-effective AI development! 🌟 Read more: Sky-T1: Train your own O1 preview model within $450




1/2
@uavster
You can now train a model that beats OpenAI's o1 in math and coding for less than $450.

Meet UC Berkeley’s Sky-T1–32B-Preview. Link in 🧵



2/2
@uavster
Code and data are open.
GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450










1/11
@AIBuzzNews
Meet the model trained for just $450.

And it rivals OpenAI's best.

Here’s the story behind Sky-T1-32B-Preview:



GhJsdhrXAAAjvCr.jpg


2/11
@AIBuzzNews
Meet Sky-T1-32B-Preview from @NovaSkyAI, an open-source reasoning model that stands toe-to-toe with o1-preview on leading reasoning and coding benchmarks.

The best part? This model was trained for just $450.



GhJsdgeXIAAMMgM.jpg


3/11
@AIBuzzNews
Sky-T1-32B-Preview outperforms on key benchmarks:

Math500: 82.4% (vs. 81.4% by o1-preview)
AIME24: 43.3% (vs. 40.0%)
LiveCodeBench-Hard: 17.9% (vs. 16.3%)



GhJsdg0XUAAJys6.jpg


4/11
@AIBuzzNews
Here’s how it was trained:

- Base model: Qwen2.5-32B-Instruct
- Data: Sourced from QwQ-32B, enhanced with GPT-4o-mini and reject sampling for precise math and coding traces
- Compute: 8 H100 GPUs, 19 hours, and $450 in cost



5/11
@AIBuzzNews
Sky-T1-32B-Preview is just the start. They're working on:

- More efficient reasoning models
- Advanced methods for scaling during inference
Stay tuned!



6/11
@mushfiq_sajib
Impressive efficiency in training Sky-T1-32B-Preview. Innovations like this redefine AI possibilities within tight budgets.



7/11
@AIBuzzNews
This opens up many doors for businesses to train their own models.



8/11
@shushant_l
Wow, you've explained in detail



9/11
@AIBuzzNews
Trying to make it understandable for everyone.



10/11
@Whizz_ai
Do you think that these new Rivals will have potential to compete Open AI?



11/11
@AIBuzzNews
Well, it does according to the benchmarks.




1/11
@victormustar
Reasoning traces are the new gold, and the open-source community is going to nail this. Check Sky-T1-32B-Preview release (reportedly rivals o1-preview for coding). The team has fully disclosed all technical details, code, dataset, and weights. 🔥

NovaSky-AI/Sky-T1-32B-Preview · Hugging Face



2/11
@victormustar
Sky-T1: Train your own O1 preview model within $450



3/11
@ivanfioravanti
what??? This rivals o1-preview? TOP TOP TOP!



4/11
@PascalBauerDE
Model Description
This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.

Omg. What? That is like next leve. 17k data only.



5/11
@TheAIVeteran
I know how to generate reasoning traces from scratch. See my pinned thread for some details.



6/11
@Teknium1
The datasets listed dont seem to be the subsets used to train this fyi



7/11
@sinanisler
it is just matter of time we have o1 level opensource model and maybe even under 32b



8/11
@anushkmittal
reasoning traces are the new moat



9/11
@carsenklock
Amazing!! GG



10/11
@AbelIonadi
Will try this. Sounds interesting



11/11
@9Knowled9e
🔥





1/11
@reach_vb
Sky-T1-32B-Preview, open source O1 like model trained for < 450$, achieves competitive reasoning and coding performance (e.g., 82.4 on Math500, 86.3 on LiveCode-East) compared to QwQ (85.4, 90.7) and o1-preview (81.4, 92.9) 🔥

Fully open-source with 17K training data, 32B model weights, and outperforming Qwen-2.5-32B-Instruct across benchmarks 💥



GhB38aVWIAAjngm.jpg


2/11
@reach_vb
Model checkpoints:

NovaSky-AI/Sky-T1-32B-Preview · Hugging Face



3/11
@InfSoftwareH
Has it been trained only on the benchmarks data?😂



4/11
@reach_vb
The best part is that anyone can quite easily test this with a less than 450USD :smile:



5/11
@ichrvk
Would love to see how these benchmarks hold up in real-world scenarios. The training cost is fascinating though - we're truly entering the era of bedroom LLMs.



6/11
@StephenEdginton
It’s a finetune should really say that still impressive.



7/11
@steve_ike_
How is this possible 🤯.



8/11
@rogue_node
it's a finetuned model .



9/11
@dzamsgaglo
Does somebody compare it to Phi-4 ?



10/11
@prithiv_003
This is awesome in every aspects, less than 450$, less than 50k TD Just Nice Work 🤩



11/11
@SynthSquid
I wanna see this tested on Aider's new benchmark





1/3
@iamluokai
The NovaSky team fine-tuned the open-source Qwen2.5-32B-Instruct model. The training lasted for 19 hours using 8 H100 GPUs, costing about $450 (priced according to Lambda Cloud). The resulting Sky-T1-32B-Preview model performs comparably to o1-preview in reasoning and coding benchmarks, demonstrating the possibility of efficiently replicating high-level reasoning capabilities at a low cost. 🧵1/3



GhGM_E9bEAAd__X.jpg


2/3
@iamluokai
🧵2/3

The NovaSky team has open-sourced all the details of the model (including data, code, model weights, etc.), making it easy for community members to replicate and improve the results.

Project: Sky-T1: Train your own O1 preview model within $450



3/3
@iamluokai
🧵3/3

Github: GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450





1/7
@abacaj
This is just standard SFT and outperforms o1-preview? Questionable…

[Quoted tweet]
1/6 🚀
Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450!

📊Blog: novasky-ai.github.io/posts/s…
🏋️‍♀️Model weights: huggingface.co/NovaSky-AI/Sk…
[media=twitter]1877793041957933347[/media]

Gg9Azj5a0AAElZU.jpg


2/7
@willccbb
QwQ is already an open source 32B model which outperforms o1-preview in many benchmarks and was finetuned from Qwen2.5-32B

they just kinda did QwQ again but mostly worse, using QwQ data



3/7
@abacaj
Yea I feel like it’s not that interesting but maybe I’m missing something



4/7
@snellingio
i think the thing you’re both “missing” is that the data is available and reproducible (hopefully)

having that dataset available is great imo



5/7
@willccbb
totally fair, missed that bit

will be cool to see how it translates for smaller models

i suspect that you should be to get a really good code/math reasoner at like 7b with these kinds of tricks



6/7
@starkov100
Sky-T1: Train your own O1 preview model within $450



GhEwLj4WkAAvrOj.jpg


7/7
@snellingio
yeah but they just used vanilla SFT from what I can tell.

am not convinced that SFT only will be successful in small models with this kind of data (it obviously wasn't in this case)








1/21
@NovaSkyAI
1/6 🚀
Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450!

📊Blog: Sky-T1: Train your own O1 preview model within $450
🏋️‍♀️Model weights: NovaSky-AI/Sky-T1-32B-Preview · Hugging Face



Gg9Azj5a0AAElZU.jpg


2/21
@NovaSkyAI
2/6📂
Data curation, train, eval code, 17K training data: GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450

Collaborate, replicate, and innovate! 💡



3/21
@NovaSkyAI
3/6📈
Sky-T1-32B-Preview excels in both math & coding:
- Math500: 82.4% (o1-preview: 81.4%)
- AIME24: 43.3% (o1-preview: 40.0%)
- LiveCodeBench-Hard: 17.9% (o1-preview: 16.3%)



4/21
@NovaSkyAI
4/6 ⚙️
The training recipe:
- Base: Qwen2.5-32B-Instruct
- Data: Curated from QwQ-32B, enhanced with GPT-4o-mini, reject sampling for high-quality math & coding reasoning traces.
- Cost: 8 H100 GPUs, 19 hours, $450.



Gg9B6sXboAAUFJN.jpg


5/21
@NovaSkyAI
5/6🌟
Sky-T1-32B-Preview is just the beginning! Next steps:
- Efficient models with strong reasoning
- Explore advanced techniques for test-time scaling



6/21
@NovaSkyAI
6/6 Acknowledgements:

Built with support from: @LambdaAPI @anyscalecompute for compute
Academic Insights from STILL-2 & Qwen Teams

💻 Built at Berkeley’s Sky Computing Lab @BerkeleySky with the amazing NovaSky team:
Contact: novasky.berkeley@gmail.com!



7/21
@ruansgon
@UnrollHelper



8/21
@nooriefyi
the future of ai is collaborative



9/21
@chillzaza_
long live open source



10/21
@Kitora_Su
Congratulations on this amazing feat to the team.



11/21
@DmitriyAnderson
Can I run it on RTX 4090?



12/21
@therealmrcrypto
@Bobtoshi69



13/21
@Cyril_Engineer
Can this be run locally and how much VRAM does it require?



14/21
@Gopinath876
@MaziyarPanahi any thoughts on this models?

I tested it locally doing really.



15/21
@steve_ike_
Matches o-1 preview and trained under $450 don’t make sense together! 😂



16/21
@iamRezaSayar
This is very cool!🔥but I'm a bit confused on why you chose to fine-tune Qwen2.5 instead of QwQ, given that both are the same size, and even as awesome a jump in performance that we see here, they still seem to fall short of QwQ. So, was there a reason you didn't go with QwQ? 👀



17/21
@altryne
What is this madness :smile:

Will mention this in the next @thursdai_pod 👏

Welcome to come tell us about it!



18/21
@Yuchenj_UW
Huge if it’s not trained on the test set



19/21
@TechMemeKing
Insane



20/21
@jasonkneen
LFG!!



21/21
@nisten
at first i was like.. meh just a QwQ finetune but then... i realized you trained this off of Q32 Instruct 👀
holy cow ok, gonna try this out
 
Top