Colibreh @bnew explains difference between ChatGPT and Chinese AI Deepseek

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,664
Reputation
8,852
Daps
164,928







1/14
@_lewtun
I'm running a shyt-ton of GRPO experiments on DeepSeek's distilled models with the LIMO dataset and it really works well 🔥!

Depending on the hyperparameters, I'm able to get ~10 point boost on AIME24 and GPQA, with ~3 point boost on MATH-500 (likely saturated).

Link with more details in post below 👇



GjQho20XgAAQZjd.jpg


2/14
@_lewtun
I'll be using this discussion tab to track my progress - chime in there if you have other ideas to test!

open-r1/README · [Experiment] Applying GRPO to DeepSeek-R1-Distill-Qwen-1.5B with LIMO



3/14
@Teknium1
Is there a standardized way to run aime?



4/14
@_lewtun
We use a custom lighteval task in open-r1 that generates 32k tokens and then applies @HKydlicek's amazing math-verify parser to compare with the ground truth: GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1



5/14
@chewkokwah
Will you test it on the recently released AIME 2025 ?



6/14
@_lewtun
Yes!



7/14
@MaziyarPanahi
awesome work! I tried the 0.5B and 3B, but with the Unsloth code I can go up to Llama 8B.
But today I will switch to Qwen2.5 7B base then eval, then do some distillations (or use what's already out there) and then GRPO then eval.
Which repo/branch are you using for the evals?



8/14
@_lewtun
Cool, excited to see what you get! The evals are running on the main branch of open-r1: GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1



9/14
@IAmEricHedlin
Very cool! I’m curious what kind of gpus you’re using and how many?



10/14
@_lewtun
All experiments running on one node of 8 x H100s



11/14
@bronzeagepapi
Great to see experiments alike, I find 3b to be more stable and representative from a scaling perspective

Is have also been trying 7b+ with unsloth setup



12/14
@_lewtun
Yeah the 1.5B model certainly has some quirks :smile: Are you applying GRPO directly to the base model or first doing distillation and then GRPO?



13/14
@paws4puzzles
Impressive work on LIMO! That 10-point boost on AIME24 and GPQA is stellar. Any tips on hyperparameter tuning?



14/14
@Nuliayuk
I could be wrong, but I think you missed the point of the LIMO paper if you're running GRPO with their dataset.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top