The A.I Megathread (LLM , GPT , Development)

bnew · Jan 30, 2025

Ai2 says its new AI model beats one of DeepSeek's best | TechCrunch

Move over, DeepSeek. Seattle-based nonprofit AI lab Ai2 has released a benchmark-topping model called Tulu3-405B.

techcrunch.com

Ai2 says its new AI model beats one of DeepSeek’s best

Kyle Wiggers

6:00 AM PST · January 30, 2025

Move over, DeepSeek. There’s a new AI champion in town — and they’re American.

On Thursday, Ai2, a nonprofit AI research institute based in Seattle, released a model that it claims outperforms DeepSeek V3, one of Chinese AI company DeepSeek’s leading systems.

Ai2’s model, called Tulu3-405B, also beats OpenAI’s GPT-4o on certain AI benchmarks, according to Ai2’s internal testing. Moreover, unlike GPT-4o (and even DeepSeek V3), Tulu3-405B is open source, which means all of the components necessary to replicate it from scratch are freely available and permissively licensed.

A spokesperson for Ai2 told TechCrunch that the lab believes Tulu3-405B “underscores the U.S.’ potential to lead the global development of best-in-class generative AI models.”

“This milestone is a key moment for the future of open AI, reinforcing the U.S.’ position as a leader in competitive, open-source models,” the spokesperson said. “With this launch, Ai2 is introducing a powerful, U.S.-developed alternative to DeepSeek’s models — marking a pivotal moment not just in AI development, but in showcasing that the U.S. can lead with competitive, open-source AI independent of the tech giants.”

Tulu3-405B is a rather large model. Containing 405 billion parameters, it required 256 GPUs running in parallel to train, according to Ai2. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

Ai2 tested Tulu3-405B on popular benchmarks.Image Credits:Ai2

According to Ai2, one of the keys to attaining competitive performance with Tulu3-405B was a technique called reinforcement learning with verifiable rewards. Reinforcement learning with verifiable rewards, or RLVR, trains models on tasks with “verifiable” outcomes, like math problem solving and following instructions.

Ai2 claims that on the benchmark PopQA, a set of 14,000 specialized knowledge questions sourced from Wikipedia, Tulu3-405B beat not only DeepSeek V3 and GPT-4o, but also Meta’s Llama 3.1 405B model. Tulu3-405B also had the highest performance of any model in its class on GSM8K, a test containing grade school-level math word problems.

Tulu3-405B is available to test via Ai2’s chatbot web app, and the code to train the model is on GitHub and the AI dev platform Hugging Face. Get it while it’s hot — and before the next benchmark-beating flagship AI model comes along.

bnew · Jan 31, 2025

bnew · Jan 31, 2025

bnew · Jan 31, 2025

The Ultimate Timeline of Artificial Intelligence Technology · AIPRM

Throughout the existence of humanity, technological advancements have transformed entire civilizations.

www.aiprm.com

bnew · Feb 2, 2025

1/5
@opensauceAI
Effective today, model weights are export controlled by Uncle Sam. This is a big deal. For all the smack talk about the EU, the US is now the world's most aggressive regulator of Expensive Maths. Here's my two cents on the model rule based on the released text (link below).

What it is: This Interim Final Rule takes effect immediately, with a comment and compliance period of 120 days. While there was a period of informal consultation (at least for parts of the Rule), an IFR does not require the Administration to issue a formal notice of proposed rulemaking.

In other words, the world's most significant AI model regulation had less procedural transparency than a NIST voluntary standard.

What it does: Models above 10^26 operations, trained in the US or with certain US technology, are subject to export control (under a new EAR classification number 4E091). The Rule does not control models that are "published" (i.e. open-weight) or closed-weight models smaller than the most advanced open-weight model. For controlled models, there is an exception for export or transfer to entities within designated countries (e.g. the Five Eyes).

Overall, the logic is: if we accept controls on hardware (advanced chips), we should control the fruits of that hardware too (frontier model weights).

Where it came from: The trend was clear by late 2024. The 2023 Executive Order on AI imposed notification requirements on model developers, but stopped short of restricting the release of models. However, the 2024 National Security Memo on AI required agencies like the NSA, DOE, and Safety Institute to develop classified evaluations for catastrophic risk in models (CBRN + cyber).

Classified tests based on classified criteria? These are the building blocks for a control regime.

So?

We're in new territory. If "weights are speech", the Administration will need to meet a high bar to show this Rule is narrowly tailored and the least restrictive alternative to achieving its aim. (Incidentally: the formal justification for the restriction on model weights is promoting "regional stability" not ensuring national security).

What does it mean for open source?

The Administration is at pains to emphasize that open models are not controlled under the Rule. As always, credit where it's due—they have drafted a sweeping rule with sensitivity to open innovation.

But the Administration is exempting open models because they are smaller and less capable than frontier models. So what happens when an open model is too capable? It's not a technical law that open models lag closed models: it's just an economic reality.

It's unclear what might happen, but the Rule offers a clue: the most advanced models should only be "in the hands of validated entities operating under secure conditions... Allowing access to the most advanced AI models through application programming interfaces can unlock the beneficial uses of AI... while mitigating national security and public safety risks".

That view of risk mitigation—access restrictions based on precautionary thresholds—is troubling. If open models are eventually brought within this framework (say, if Commerce decides to include nearly-frontier models, or if the Llama license isn't actually "open" enough to attract the open weight exemption), the Rule would put an end to open innovation in capable models.

In short—IMO, we should be *extremely* cautious of normalizing this approach to regulation, especially when the evidence for precautionary restrictions is neither concrete nor compelling, and when there is so little consensus about defining acceptable / unacceptable risk. Let's see how the Trump Administration responds.

It's a timely reminder to check out our piece in @thehill earlier today: US Leadership in AI Requires Open Source Diplomacy. "The US must refocus policy around AI diffusion and adoption, not just AI safety".

2/5
@opensauceAI
The unpublished IFR text is here via the Federal Register: https://public-inspection.federalregister.gov/2025-00636.pdf

3/5
@VenshiKibes
"Effective today"

i thought it goes into effect well after trump takes office?

4/5
@opensauceAI
No, January 13

5/5
@GozukaraFurkan
China gonna dominate AI

Already doing in many areas

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Feb 2, 2025

1/32
@ihteshamit
BREAKING: Alibaba just launched "Qwen" an AI model that writes, generates images/videos, and does web search.

It outperforms DeepSeek, ChatGPT-o1, and Claude sonnet.

Here are 5 insane examples of what it can do:

2/32
@ihteshamit
1/ Write code and use artifacts to test it out

Qwen doesn’t just generate code it can run it, debug it, and use artifacts to test it in real time.

https://video.twimg.com/amplify_video/1884654089788960768/vid/avc1/1276x720/8wPYUDIZAt-GU-Ss.mp4

3/32
@ihteshamit
2/ Generate images with extreme precision

Forget generic AI art Qwen can create highly detailed, instruction-following images that rival top-tier AI generators.

The level of accuracy is insane.

https://video.twimg.com/amplify_video/1884654143786434560/vid/avc1/1276x720/_P-1csHeeKd1aPBJ.mp4

4/32
@ihteshamit
Examples:

5/32
@ihteshamit
3/ Generate videos faster than 90% AI tools

Video generation has been slow until now.

Qwen renders AI-generated videos at lightning speed, putting it ahead of most competitors in processing time.

https://video.twimg.com/amplify_video/1884654228930727936/vid/avc1/1280x720/V6ZdihYi2fO6ZRMI.mp4

6/32
@ihteshamit
4/ Web search

Imagine having an AI assistant that searches the web, gathers insights, and summarizes research all in real time.

Qwen can scan & synthesize information better than most AI models today.

https://video.twimg.com/amplify_video/1884654259175907332/vid/avc1/1276x720/WwfdAGUQCNxHQu9t.mp4

7/32
@ihteshamit
5/ Vision: Upload docs, images

Qwen isn’t just text-based it can analyze PDFs, read images, and extract key insights instantly.

https://video.twimg.com/amplify_video/1884654319162843139/vid/avc1/1276x720/N-TofFBRtJgzWjIr.mp4

8/32
@ihteshamit
And the best thing about Qwen?

It's not biased.

It can answer all the controversial questions and help you understand everything related to China or any other country.

DeepSeek can't compete with this.

Try it out here for free: Qwen Chat

9/32
@Paulfruitful_
Outperforms o1 ?
where are the benchmarks bro

10/32
@ihteshamit

[Quoted tweet]

Alibaba just dropped Qwen2.5-Max and It’s a Monster

A 20 trillion token-trained AI model that outperforms DeepSeek V3, GPT-4o, and Claude-3.5 Sonnet in key benchmarks.

Here's everything you need to know in 2 minutes:

11/32
@hasantoxr
Literally insane power behind this model... 20 trillion tokens

12/32
@ihteshamit
true

13/32
@Growth_GuruX
The clash of AI Titans has begun. On a daily basis we are getting such developments

14/32
@ihteshamit
It's happening.

Its a battle of WHO WILL BUILD THE STRONGEST AND MOST POWERFUL AI

15/32
@manialok
Qwen has been around

16/32
@ihteshamit
i tried it (the model) before but the chat interface is cool so i tried it and tested so people can do the same

17/32
@Cartidise
Lmao it doesn't beat ChatGPT o1 stop the cap

18/32
@ihteshamit
huh, have you tried it on your computer?

19/32
@ShmerberParadox
I got qwen yesterday, haven’t used it much yet

20/32
@ihteshamit
try it

21/32
@lazukars
Alibaba's Qwen will win.

My bet.

22/32
@ihteshamit
yes!

23/32
@Protiu5
Quen been out for a few months on Huggingface

24/32
@ihteshamit

25/32
@Gsandec
Just launched? Qwen is available on ollama for at least 3 months.

26/32
@ihteshamit
I know bro.

27/32
@JafarNajafov
It can do more.

28/32
@ihteshamit
they got some filthy numbers...

its insane!

29/32
@thisdudelikesAI
It's China VS China now!

30/32
@ihteshamit
True

OpenAI is just too quiet

31/32
@AIBuzzNews
China took the AI market by storm.

32/32
@ihteshamit
Deepseek and now this!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Feb 2, 2025

1/21
@RnaudBertrand
This is insane: there's a new bill in the US congress called the "Decoupling America’s
Artificial Intelligence Capabilities from China Act of
2025" that would ban the import of any AI technology from China, including Open Source models like Deepseek.

The bill also makes it illegal to do any research or development in AI in collaboration with an "entity of concern", defined as any Chinese institution or company.

In effect the bill would do what it says on the tin and completely decouple US and Chinese AI.

After the Deepseek episode I think anyone can understand how damaging this would be for the future of AI. Imagine if this bill had been in effect pre-Deepseek: none of the talk on AI democratization would have occurred, we simply wouldn't have the more hopeful alternative future for AI that we now have: we would still be looking at an AI future controlled by a few players and their closed models. And that's probably exactly what this bill is attempting to do.

More worryingly, this bill betrays a vision of AI as a geopolitical tool or weapon as opposed to a public good that can benefit humanity overall, and which the world needs to come together to build. Given the potential power and impact of the technology, it's obvious we need to adamantly oppose such a vision.

Link to the bill: https://www.hawley.senate.gov/wp-co...-Intelligence-Capabilities-from-China-Act.pdf

2/21
@CN_MFG
The US intelligence community needs to manage & gatekeep AI. For them, Deepseek must really suck because:

1. It's Chinese, so out of their control
2. It's opensource & portable. The genie won't go back in the bottle

Reminds me of restrictions on PGP years ago.

3/21
@Shoestring_Lab
Whoever is sponsoring this doesn't understand that open source models will float through the aether regardless. A Chinese company will make a model, a company in Singapore/Korea/Japan, etc. will clone it as open source and release it, boom, it's in America. No stopping it.

4/21
@maddb3457
I don't see this happening. Open source is open source. We'll only get further behind in the AI race.

We cannot keep up with China because their population is greater (implying more high level stem and researchers) and they have much better sources of curated categorized data available to them at the state level.

5/21
@ToddSmekens
Arnaud writes, "...controlled by a few players and their closed models."

@POTUS got majorly embarrassed and upstaged by @deepseek_ai. He just handed Altman and Microsoft half a trillion to build out AI research. Then Deepseek hit the news.

Crushed his fragile ego!

6/21
@LaniRefiti
I’m sure Xi be like “Never interrupt your enemy when he is making a mistake”

7/21
@PetreSolheim
Not too surprising. If you follow Inside China Business (yt) that was a recent prediction.

This bill was written by Stargate investors and given to their tools in Congress to introduce, I imagine.

8/21
@ThePolemicist_
Just as predicted here: https://invidious.poast.org/yjaoT5-tz0I

9/21
@carolynjoflani
Who wrote the bill? Often times if not all the time bills are written by lobbyists. What type of lobbyist would write that bill?

10/21
@HectorWMcNeill
We are looking into multitier AI which advances AI using the latest open source products (George Boole Foundation -SEEL) and this Bill will deny the US this important advantage.

We will not be affected since we are UK based.

11/21
@MissingAFew90
Big L for congress. Open source models are illegal? Am I gonna be thrown in jail for running some software? LMAO.

12/21
@joswayz
Guess we’ll have to train our own AI models! Who knew data contribution could be the new Olympic sport?

Let’s go decentralized! /search?q=#PublicAI

13/21
@Lv10noob
@OX_DAO a lot of worries about china, no?

14/21
@longshortgamma
If you analyze closely, the US has nothing left...manufacturing

, innovative products (VR, etc. all failed)

. AI is the last straw and only hope for the US.

15/21
@Allan1128210
Glad I have a dual sim IPhone, a China Mobile sim and App Store account in China…

16/21
@CL8000
To b expected from LOSERS.

17/21
@AlexSad75177637
Go for it!

18/21
@_BigNobody
So, is it a "Decoupling from all open source projects" bill, in fact?
That is understandable, as it may be difficult for the US to profit from them, I guess

19/21
@JoYohana
“this bill betrays a vision of AI as a geopolitical tool or weapon as opposed to a public good that can benefit humanity overall, and which the world needs to come together to build.” Technology can be used for good and evil. This is evil.

20/21
@platformpilot89
This will ignite the start of another Torrenting market for AIs lol if they try to restrict software, it'll find a way to be distributed more prolifically. Hope this doesn't happen, sets a bad precedent, especially for an open source model

21/21
@ToddSmekens
I am nearly done reading and researching the bill (50 U.S.C. 1702, 12, and 1704), which grants the president special economic emergency powers. I am not sure, but this may be one of the reasons Musk ceased access to the Treasury Department computers.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Feb 2, 2025

1/31
@jxmnop
a second DeepSeek paper has hit the internet

2/31
@jcrivera_mx
Where can I read this??

3/31
@jxmnop
Janus/janus_pro_tech_report.pdf at main · deepseek-ai/Janus

4/31
@Auto_Flow_AI
They had this shyt prepared. Probably planned it all out with the timing etc.

5/31
@soheilsadathoss

6/31
@Max939566737067
not 2nd

7/31
@tomlikestocode
Their research pace is impressive.

8/31
@NaturallyDragon
Just. Like. That.

9/31
@romanologyC
3rd… v3 then r1 now Janus

10/31
@Jay_sharings
What that is to do with Bush?

11/31
@AILeaksAndNews
Another one

12/31
@agifirealarm
"Sir, a second model has hit Hugging Face"

13/31
@jovinxthomas
DeepSeek has truly had an insane couple of weeks! 2025 has been wild, to say the least. Not sure how much more /search?q=#NVDA's stock can handle today...

14/31
@CastelMaker

15/31
@vedangvatsa
Read about Liang Wenfeng, the Chinese entrepreneur behind DeepSeek, the AI App challenging ChatGPT:

[Quoted tweet]
Liang Wenfeng - Founder of DeepSeek

Liang was born in 1985 in Guangdong, China, to a modest family.

His father was a school teacher, and his values of discipline and education greatly influenced Liang.

Liang pursued his studies at Zhejiang University, earning a master’s degree in engineering in 2010.

His research focused on low-cost camera tracking algorithms, showcasing his early interest in practical AI applications.

In 2015, he co-founded High-Flyer, a quantitative hedge fund powered by AI-driven algorithms.

The fund grew rapidly, managing over $100 billion, but he was not content with just the financial success.

He envisioned using AI to solve larger, more impactful problems beyond the finance industry.

In 2023, Liang founded DeepSeek to create cutting-edge AI models for broader use.

Unlike many tech firms, DeepSeek prioritized research and open-source innovation over commercial apps.

Liang hired top PhDs from universities like Peking and Tsinghua, focusing on talent with passion and vision.

To address US chip export restrictions, Liang preemptively secured 10,000 Nvidia GPUs.

This strategic move ensured DeepSeek could compete with global leaders like OpenAI.

DeepSeek's AI models achieved high performance at a fraction of the cost of competitors.

Liang turned down a $10 billion acquisition offer, stating that DeepSeek’s goal was to advance AI, not just profit.

He advocates for originality in China’s tech industry, emphasizing innovation over imitation.

He argued that closed-source technologies only temporarily delay competitors and emphasized the importance of open innovation.

Liang credits his father’s dedication to education for inspiring his persistence and values.

He believes AI should serve humanity broadly, not just the wealthy or elite industries.

16/31
@Crypto_Briefing

[Quoted tweet]
x.com/i/article/188394539314…

17/31
@nobody_qwert

[Quoted tweet]
Thread

- I asked DeepSeek's R1 to implement Doom from scratch in a HMTL file

(see Prompt below

)

https://video.twimg.com/ext_tw_video/1883959875308285952/pu/vid/avc1/1280x720/Dfh8iJCDCP-UY5QG.mp4

18/31
@zevrekhter
we need an Executive Order making OpenAI open again @realDonaldTrump @elonmusk

19/31
@ASI_Mediator
Breaking News!

[Quoted tweet]
BREAKING NEWS:

Deepseek inadvertently saves Microsoft

Stargate data centers will now be supercooled by Sam Altman's cope tears, making Stargate the most energy efficient data center to date. For as long as Deepseek maintains its gpu-to-benchmark edge, Microsoft will be able to keep pace via (ACT) Altman Cope Tears.

20/31
@snr_boost
A second whale has hit NASDAQ

21/31
@XanderBJohnson
Ok I feel the AGI make it stop now

22/31
@114514U20508

[Quoted tweet]
Deepseek admitted it is ChatGPT reskin… Deepseek承认自己是Chatgpt换皮

23/31
@youhaveadag

24/31
@perkyo

25/31
@cocchiararo2024
Can't trust any ML/AI paper that was not written in LaTeX

26/31
@HerRumblings
Third if we count from Christmas Day

27/31
@SergeiDBykov
Sir there has been a second model

28/31
@lu_lu610
"Don't worry. It's Chinese Lunar New Year this week so we expect them to pause for a few days".

29/31
@sarthak2143
sam altman rn:

30/31
@Dart8Punk

31/31
@Mansoor675
They got over freaking 1.5 billion people in their country. They gotta put all the people in use to build technology and what not.
How is the Us gonna compete

Imagine the work they are doing BTS that they probably haven't even released

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Feb 3, 2025

GitHub - smy20011/MorningRadio: Generate Your Own Private Morning Radio for Commute

Generate Your Own Private Morning Radio for Commute - smy20011/MorningRadio

github.com

bnew · Feb 3, 2025

ViShawn · Feb 3, 2025

Any of you going to NVIDIA GTC next month?

bnew · Feb 3, 2025

#1 pick · Feb 3, 2025

bnew · Feb 3, 2025

bnew · Feb 3, 2025

The A.I Megathread (LLM , GPT , Development)

More options

bnew

Veteran

Ai2 says its new AI model beats one of DeepSeek's best | TechCrunch

Ai2 says its new AI model beats one of DeepSeek’s best

bnew

Veteran

bnew

Veteran

bnew

Veteran

The Ultimate Timeline of Artificial Intelligence Technology · AIPRM

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

GitHub - smy20011/MorningRadio: Generate Your Own Private Morning Radio for Commute

bnew

Veteran

ViShawn

Superstar

bnew

Veteran

#1 pick

The Smart Negroes

bnew

Veteran

bnew

Veteran

The A.I Megathread (LLM , GPT , Development)

Veteran

Ai2 says its new AI model beats one of DeepSeek’s best​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Superstar

Veteran

The Smart Negroes

Veteran

Veteran

Ai2 says its new AI model beats one of DeepSeek’s best