1/14
@_lewtun
I'm running a shyt-ton of GRPO experiments on DeepSeek's distilled models with the LIMO dataset and it really works well
!
Depending on the hyperparameters, I'm able to get ~10 point boost on AIME24 and GPQA, with ~3 point boost on MATH-500 (likely saturated).
Link with more details in post below
2/14
@_lewtun
I'll be using this discussion tab to track my progress - chime in there if you have other ideas to test!
open-r1/README · [Experiment] Applying GRPO to DeepSeek-R1-Distill-Qwen-1.5B with LIMO
3/14
@Teknium1
Is there a standardized way to run aime?
4/14
@_lewtun
We use a custom lighteval task in open-r1 that generates 32k tokens and then applies @HKydlicek's amazing math-verify parser to compare with the ground truth:
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1
5/14
@chewkokwah
Will you test it on the recently released AIME 2025 ?
6/14
@_lewtun
Yes!
7/14
@MaziyarPanahi
awesome work! I tried the 0.5B and 3B, but with the Unsloth code I can go up to Llama 8B.
But today I will switch to Qwen2.5 7B base then eval, then do some distillations (or use what's already out there) and then GRPO then eval.
Which repo/branch are you using for the evals?
8/14
@_lewtun
Cool, excited to see what you get! The evals are running on the main branch of open-r1:
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1
9/14
@IAmEricHedlin
Very cool! I’m curious what kind of gpus you’re using and how many?
10/14
@_lewtun
All experiments running on one node of 8 x H100s
11/14
@bronzeagepapi
Great to see experiments alike, I find 3b to be more stable and representative from a scaling perspective
Is have also been trying 7b+ with unsloth setup
12/14
@_lewtun
Yeah the 1.5B model certainly has some quirks
Are you applying GRPO directly to the base model or first doing distillation and then GRPO?
13/14
@paws4puzzles
Impressive work on LIMO! That 10-point boost on AIME24 and GPQA is stellar. Any tips on hyperparameter tuning?
14/14
@Nuliayuk
I could be wrong, but I think you missed the point of the LIMO paper if you're running GRPO with their dataset.
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196