China just wrecked all of American AI. Silicon Valley is in shambles.

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,208
Reputation
8,782
Daps
163,911




















1/52
@jiayi_pirate
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works

Through RL, the 3B base LM develops self-verification and search abilities all on its own

You can experience the Ahah moment yourself for < $30
Code: GitHub - Jiayi-Pan/TinyZero: Clean, accessible reproduction of DeepSeek R1-Zero

Here's what we learned 🧵



GiEwamOaEAAAGck.jpg


2/52
@jiayi_pirate
The recipe:

We follow DeepSeek R1-Zero alg -- Given a base LM, prompts and ground-truth reward, we run RL.

We apply it to CountDown: a game where players combine numbers with basic arithmetic to reach a target number.



3/52
@jiayi_pirate
The results: It just works!

Model start from dummy outputs but gradually develop tactics such as revision and search.

In the following sample, the model propose a solution, self-verify, and iteratively revise it until it works.

Full experiment log: jiayipan



GiEweawaMAgyR1N.jpg


4/52
@jiayi_pirate
Quick ablations on CountDown:
Base model quality is key:

We run Qwen-2.5-Base 0.5B, 1.5B, 3B to 7B. 0.5B guess a solution and stop. From 1.5B, the model start learning to search, to self-verify and to revise its solutions, enabling them to achieve much higher scores.



GiEwh8raoAA1Mf5.jpg


5/52
@jiayi_pirate
Either base or instruct model works

- Instruct model learns faster, but converges to about same performance as base
- Instruct model's output are more structured and readable

So extra instruction tuning isn't necessary, which supports R1-Zero's design decision



GiEwin3bkAAcv5M.jpg


6/52
@jiayi_pirate
The specific RL alg doesn't matter much

We tried PPO, GRPO and PRIME. Long cot all emerge and they seem all work well. We haven't got the time to tune the hyper-parameters, so don't want to make quantitative conclusions about which alg works better.



GiEwjDxaMAQ6COV.jpg


7/52
@jiayi_pirate
Model's reasoning behavior is very task dependent:

- For countdown, the model learns to do search and self-verificatoin
- For number multiplicatoin, the model instead learns to break down the problem using distirbution rule and solve it step by step.



GiEwjpUaMAIkdWo.jpg


8/52
@jiayi_pirate
Everything's open at GitHub - Jiayi-Pan/TinyZero: Clean, accessible reproduction of DeepSeek R1-Zero

And it costs < $30 to train the model! We hope this project helps to demystify the emerging RL scaling research and make it more accessible!



9/52
@jiayi_pirate
One caveat, of course, is that it's validated only in the Countdown task but not the general reasoning domain. We are now bounded by compute, and please reach out if you wanna help!



10/52
@jiayi_pirate
A wild ride with @JunjieZhang12 @xingyaow_ @lifan__yuan



11/52
@deter3
The dataset in github is Jiayi-Pan/Countdown-Tasks-3to4 , right ?



12/52
@jiayi_pirate
Yes, right here Jiayi-Pan/Countdown-Tasks-3to4 · Datasets at Hugging Face



13/52
@duluhagv
is the countdown dataset gen also open source? i lol'ed when I saw this release today after working on something similar last night



GiFDpZ2W8AA9q_R.jpg


14/52
@jiayi_pirate
Hi the countdown generation code is mostly borrowed from Stream-of-Search
stream-of-search/src/countdown_generate.py at main · kanishkg/stream-of-search

Preprocessed the data is here:
Jiayi-Pan/Countdown-Tasks-3to4 · Datasets at Hugging Face

Everything's open and reproducible



15/52
@Samhanknr
How feasible do you think it is to teach a model to work on a given codebase. For eg - teach is to write unit tests , pass unit tests in given codebase using RL. Would it be affordable ?



16/52
@jiayi_pirate
That’s definitely possible. We are working on this

[Quoted tweet]
Introducing SWE-Gym: An Open Environment for Training Software Engineering Agents & Verifiers

Using SWE-Gym, our agents + verifiers reach new open SOTA - 32%/26% on SWE-Bench Verified/Lite,
showing strong scaling with more train / test compute

github.com/SWE-Gym/SWE-Gym [🧵]


Gff2Xt9aMAAOo7B.png


17/52
@frankxu2004
Very nice! One question: do you have any observation regarding CoT length changes during training? Is there a plot showcasing CoT length increased during training?



18/52
@jiayi_pirate
Great question! Early results show the 3B model initially reduces output length for correct formatting, then increases chain-of-thought length for better performance.

There may be minor code mismatches, so take this with a grain of salt.

Raw log:
jiayipan



GiFVjYpbQAAU9fA.jpg


19/52
@bennetkrause
One question: What model size and algorithm do the $30 refer to?



20/52
@jiayi_pirate
3B model, PPO, it takes 10 H100 hours



21/52
@wzihanw
Nice!



22/52
@reissbaker
this is really cool. any chance for a huggingface model upload so we can play around with it?



23/52
@_herobotics_
Any insight on why open llama achieves high response length but low scores?



24/52
@jiayi_pirate
OpenLlama doesn't like to generate EOS tokens, and since we haven't implemented stop after </answer>, the model often fails to terminate.
We didn't report OpenLlama's results in the Twitter thread as we believe the results will improve significantly once we fix this problem.



25/52
@GiorgioMantova
Do you find the original papers' numbers about the training cost to be plausible?



26/52
@jiayi_pirate
Yes, with MoE and FP8, that’s expected



27/52
@paul_cal
This is such a beautiful, efficient demonstration of something very powerful. Great idea



28/52
@anushkmittal
$30? might as well be free. nice work



29/52
@iamRezaSayar
this is very interesting.
I'm wondering about 2 things now. 1. if we could extend these beyond verifiable math problems, to say, empathy. and 2. if people would need / want to train their own personal reward model that is tuned to each user's prefrences 👀



30/52
@HolgersenTobias
That was fast 🔥 Excellent work. This paradigm seems robust and scalable.

Bottleneck now will be gathering huge, diverse sets of hard but verifiable tasks.



31/52
@CharlieYouAI
Incredible work! and super cool result



32/52
@RishfulThinking
Excuse me, fukking WHAT



33/52
@garybasin
Hell yeah



34/52
@rrektcapital
Could you kindly ELI5 what RL means in this case?



35/52
@burny_tech
Cambrian explosion of RL on LLMs begins



36/52
@bookwormengr
Could you please provide flop analysis?



37/52
@nooriefyi
love seeing this kind of ingenuity in action. what were the biggest hurdles you faced?



38/52
@SurfTheUniverse
Was this supposed to work?



GiJNcxZWEAAFTiw.jpg


39/52
@AntDX316
There's going to be no way for people to intentionally 'inflate' the required amount of tokens by adding certain things that makes it look like it costs more to run a generation, when it doesn't.

The ASI-Singularity(Godsend) is the only Global Solution, people.



40/52
@ReplayRyan
Huge



41/52
@nrehiew_
Thanks for this! I wonder if you guys tried llama 3 7B instead of llama 2 7B. Also would be interesting if you guys have like a completely overtrain/overfitted run similar to what the TULU3 team experimented with.



42/52
@suwakopro
So cool, intelligence is reproducible.



43/52
@xiaoze_jin
Need to look into it; thanks for sharing



44/52
@corefpark
Thanks for open sourcing this!!!! This is an enourmous contribution of understanding the science of RLxLLMs!!



45/52
@JunYang1688
Great post. That DeepSeek core contributions can be reproduced at lightning speed demonstrates the power of open source! It will definitely accelerate the progress towards AGI.



46/52
@fjrdomingues
How do I super like a post?



47/52
@Vrda82073569
The Bitter Lesson strikes again!



48/52
@Nick_from_Texas
Any idea how autoregressive models become capable of self verification?

How do they avoid getting stuck along one chain of thought due to an earlier suboptimal token?



49/52
@neurosp1ke
Inspired by your experiment I added a procedural generator for countdown games to GitHub - open-thought/reasoning-gym: procedural reasoning datasets



50/52
@adamnemecek1
All machine learning approaches are convolutional inverses, including RL and LLMs.



51/52
@soheilsadathoss
Awesome!



52/52
@Kathleen_Tyson_
I played Scrabble against the All Time Countdown Champion last year. He destroyed me. Nice guy, though.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

2Quik4UHoes

Why you had to go?
Supporter
Joined
Apr 30, 2012
Messages
63,381
Reputation
18,450
Daps
235,841
Reppin
Norfeast groovin…
I tried to keep it out of this thread but fukk it:mjlol:

I know that people are upset about Trump winning but there's a larger context to consider and that shyt is :mjlol:

This is just another case of the ongoing saga of whites crashing out. Yes you see Trump and Elon doing and saying all of this outrageous shyt but nikkas gotta see it for the hail Mary flailing that it is:mjlol:

What's been interesting about the past few days, and the past month for that matter, is it being out in the open now that white folks are at a fork in the road they're clearly struggling to reconcile with. They've been beaten at their own game by Asians and the merit talk has come back to bite them:mjlol: Meritocracy is Deepseek crushing the buildings and you see how mad these scamming ass megalomaniac tech bros are:mjlol:

White people chest beat excellence but what they really want is to be comfortably mediocre. It was easy to hide when they stood above the rest of the world but now the opps have caught up and literally bootstrapped themselves into being better:dead:

They want so bad to stay focused on hating black folks because that's the easy way out, but they got some real smoke at their door and they're already cracking at the seams:mjlol: They're in a potentially catastrophic position right now and I think people are too busy being mad about Trump to see the overall desperation in the moves they're making

If only black folks were focused and on the same page:wow: This is a prime opportunity

Breh, you totally see it. This shyt is actually crazy to see happen in real time…:wow:

I hope more Black people realize this and get serious. Because these mentally diseased cacs won’t go quietly we fr gotta stick together.
 

papa pimp

All Star
Joined
Mar 11, 2022
Messages
4,423
Reputation
454
Daps
10,576
So they banned Huawei but it still outsold Apple

So they tried banning chips to China and that didn’t work

So they spent billions on AI but it still got outperformed by DeepSeek which cost them a tiny fraction of the price :russ:

just to correct

Huawei did not outsell apple…its phones sold more than iphones in China but worldwide Apple is a 3 trillion dollar behemoth.

The chips ban has limited China’s ability to make smaller and smaller chips but Deepseek brings into question whether it’s necessary.

Deepseek did not outperform American bleeding edge LLMs. It performed SIMILARLY for certain task (math for example) but something like Sora (image generation) is still at the tippy top.

I know this site doesn’t like nuance or being technically correct but yall are going to hurt yourself making inferences off fake news.
 

TDUBB

All Star
Joined
Nov 1, 2013
Messages
2,495
Reputation
-60
Daps
5,780
just to correct

Huawei did not outsell apple…its phones sold more than iphones in China but worldwide Apple is a 3 trillion dollar behemoth.

The chips ban has limited China’s ability to make smaller and smaller chips but Deepseek brings into question whether it’s necessary.

Deepseek did not outperform American bleeding edge LLMs. It performed SIMILARLY for certain task (math for example) but something like Sora (image generation) is still at the tippy top.

I know this site doesn’t like nuance or being technically correct but yall are going to hurt yourself making inferences off fake news.
:comeon: bro, you've been in this thread playing Captain Save A Hoe trying to save face for Amerikka this whole entire time.

Go grab ur tap dancing shoes nerd
:camby: your running late for the old white man.
 

TDUBB

All Star
Joined
Nov 1, 2013
Messages
2,495
Reputation
-60
Daps
5,780
Tech is overvalued here in the states :heh:
We have worse electric cars
Worse AI
Worse transit

It’ll be crazy if China just booms bigger, and I was told communism does not innovate

What's funny these are just some of China’s greatest strengths. Crazy to think DeepSeek was really a side project.
 
Last edited:

papa pimp

All Star
Joined
Mar 11, 2022
Messages
4,423
Reputation
454
Daps
10,576
:comeon: bro, you've been in this thread playing Captain Save A Hoe trying to save face for Amerikka this whole entire time.

Go grab ur tap dancing shoes nerd
:camby: your running late for the old white man.

ah so you’re an idiot

got it
 

The Intergalactic Koala

Reporting for Duty
Supporter
Joined
Jan 2, 2017
Messages
61,932
Reputation
22,679
Daps
254,379
Reppin
Koalabama and the Cosmos
Somebody pointed out on Reddit that this could lead to the big tech bubble bursting. Musk told heads that there will be dark times ahead, but really this country fitting to get that wake up call.

Not going to be surprised when your favorite gazillionaire yokes themselves because they thought shyt was sweet.

We fitting to be Japan in the 90s :huhldup:
 

morris

Superstar
Joined
Oct 8, 2014
Messages
16,658
Reputation
5,006
Daps
36,594
does DeepSeek do the following?

-make slides/presentations?
-write a book?
-help make an audiobook?
-review a long YT video and give you a 2 min synopsis of fit?
-pick the Super Bowl winner?
-show you how to make a mobile app step-by-step?
 

The Intergalactic Koala

Reporting for Duty
Supporter
Joined
Jan 2, 2017
Messages
61,932
Reputation
22,679
Daps
254,379
Reppin
Koalabama and the Cosmos
The thing that separates China from the States is that they took decades to get where they are, while it's took us a decade and a half to destroy the country's process.

Over there, its one accord and communistic visuals that Trump wants to embrace, but will never be successful in because he's a fukking idiot. Education is essential, while the states destroying the school system.

Health care is a step up from what we have to deal with (word to the marsupial that's currently dealing with insurance woes).

Ways of getting around is at a snap of a finger, while we just embrace a fancier taxi share ala Uber/Lyft.

We are so primitive and lost in our own farts to see beyond the cult that this nation became. While cacs were trying to beat a dead horse with making black people's lives a living hell, China created infrastructure and railroads that can get you to point A to B within minutes.

We are so cooked its not even funny :francis:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,208
Reputation
8,782
Daps
163,911
does DeepSeek do the following?

-make slides/presentations?
-write a book?
-help make an audiobook?
-review a long YT video and give you a 2 min synopsis of fit?
-pick the Super Bowl winner?
-show you how to make a mobile app step-by-step?

1. if it can design webpages, it probably can programmatically create presentations.
2. yeah, people have been using chatgpt since version 3.5 to do that and sell them on amazon so its doable.
3. in some ways probably but they don't have a text to speech model that i know of. probably help with creating a script.
4. copy paste the youtube transcript and ask it to summarize it.
5. it'll guess but no guarantee it'll be correct.
6. probably, depends on the app.
 
Top