Colibreh @bnew explains difference between ChatGPT and Chinese AI Deepseek

bnew · Wednesday at 11:03 AM

Run Deepseek R1 at Home on Hardware from $250 to $25,000: From Installation to Questions

bnew · 2025-02-10T07:31:29-0500

1/14
@_lewtun
I'm running a shyt-ton of GRPO experiments on DeepSeek's distilled models with the LIMO dataset and it really works well

!

Depending on the hyperparameters, I'm able to get ~10 point boost on AIME24 and GPQA, with ~3 point boost on MATH-500 (likely saturated).

Link with more details in post below

2/14
@_lewtun
I'll be using this discussion tab to track my progress - chime in there if you have other ideas to test!

open-r1/README · [Experiment] Applying GRPO to DeepSeek-R1-Distill-Qwen-1.5B with LIMO

3/14
@Teknium1
Is there a standardized way to run aime?

4/14
@_lewtun
We use a custom lighteval task in open-r1 that generates 32k tokens and then applies @HKydlicek's amazing math-verify parser to compare with the ground truth: GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

5/14
@chewkokwah
Will you test it on the recently released AIME 2025 ?

6/14
@_lewtun
Yes!

7/14
@MaziyarPanahi
awesome work! I tried the 0.5B and 3B, but with the Unsloth code I can go up to Llama 8B.
But today I will switch to Qwen2.5 7B base then eval, then do some distillations (or use what's already out there) and then GRPO then eval.
Which repo/branch are you using for the evals?

8/14
@_lewtun
Cool, excited to see what you get! The evals are running on the main branch of open-r1: GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

9/14
@IAmEricHedlin
Very cool! I’m curious what kind of gpus you’re using and how many?

10/14
@_lewtun
All experiments running on one node of 8 x H100s

11/14
@bronzeagepapi
Great to see experiments alike, I find 3b to be more stable and representative from a scaling perspective

Is have also been trying 7b+ with unsloth setup

12/14
@_lewtun
Yeah the 1.5B model certainly has some quirks :smile:

Are you applying GRPO directly to the base model or first doing distillation and then GRPO?

13/14
@paws4puzzles
Impressive work on LIMO! That 10-point boost on AIME24 and GPQA is stellar. Any tips on hyperparameter tuning?

14/14
@Nuliayuk
I could be wrong, but I think you missed the point of the LIMO paper if you're running GRPO with their dataset.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Colibreh @bnew explains difference between ChatGPT and Chinese AI Deepseek

More options

bnew

Veteran

bnew

Veteran

Similar threads