1/21
@arcprize
New verified ARC-AGI-Pub SoTA!
@OpenAI o3 has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation.
And a high-compute o3 configuration (not eligible for ARC-AGI-Pub) scored 87.5% on the Semi-Private Eval.
1/4
2/21
@arcprize
This performance on ARC-AGI highlights a genuine breakthrough in novelty adaptation.
This is not incremental progress. We're in new territory.
Is it AGI? o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence.
2/4
3/21
@arcprize
Previously shared, ARC-AGI-2 (same format - verified easy for humans, harder for AI) will launch alongside ARC Prize 2025.
We're committed to running the Grand Prize competition until a high-efficiency, open-source solution scoring 85% on the latest ARC-AGI is created.
3/4
4/21
@arcprize
Read our full o3 testing report and @fchollet's perspective on this exciting breakthrough, the future of the ARC-AGI benchmark, and the path to AGI.
OpenAI o3 Breakthrough High Score on ARC-AGI-Pub
4/4
5/21
@KevMusgrave
Is this the same benchmark as the 55.5% score achieved earlier this year?
6/21
@hantla
$1000 a task?! That’s a bit steep. Will that be coming down or is the future of ai, pay-by-the-process?
7/21
@tenobrus
OVER $1000 PER TASK? jesus christ lol
8/21
@SergheiLefter
Misleading graph. Basically exploding on cost, and computation to add some "incremental" progress, seriously
9/21
@Eito_Miyamura
For anyone complaining about the cost of inference, this will come down by an insane amount
Distillation has always played the magic (See cost reduction from GPT-4 -> GPT-4o) and history will play out again
As @karpathy said, models get large & expensive before they get small & cheap
10/21
@Artoftheproblem
Video on the history of this result:
11/21
@JonathanRoseD
Interesting. The score is amazing, but I worry the compute cost (>$1000!) may be indicating that OAI is brute forcing the solution via a large static codebase. It does get some simple problems oddly incorrect! Ultimately, it's hard to be excited without it being an OPEN model.
12/21
@Phoneixx8
Tell me you have heard of
/search?q=#basedai Creatures ???
https://video.twimg.com/ext_tw_video/1870329612825395202/pu/vid/avc1/720x720/_KF6U0XY8mbSx7UC.mp4
13/21
@Phoneixx8
It's called basedAI !! @getbasedai
https://video.twimg.com/ext_tw_video/1870329352032006144/pu/vid/avc1/1280x720/611vZFR7_wlopvEZ.mp4
14/21
@ondrejindruch
Many people are concerned about the price that comes with the score.
Yet it’s always a question of time till it gets cheaper.
And it will only get cheaper.
15/21
@AlpacaNetworkAI
The surprising effectiveness of test time compute.... time for the
/search?q=#opensource decentralized community to catch up @NousResearch @ai16zdao
16/21
@amshiera
Explain to me like a 5 year old what does it mean?
17/21
@aziz0nomics
Amazing achievement.
18/21
@suraj_b19
The cost per task is $1000+
19/21
@Eito_Miyamura
This is absurd.
P(AGI before 2030) > 0.5
20/21
@hyperknot
This chart is logarithmic on the x axis and linear on the y axis! It really makes it look like linear progress is happening, whereas it's really not, it's totally misleading.
21/21
@DarbyBaileyXO
time to double down on ones dreams, and dig deep to committing to doing what you love and what makes you happy. AGI will take care of the rest, collectively
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196