bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408










1/21
@AlexGDimakis
Discovered a very interesting thing about DeepSeek-R1 and all reasoning models: The wrong answers are much longer while the correct answers are much shorter. Even on the same question, when we re-run the model, it sometimes produces a short (usually correct) answer or a wrong verbose one. Based on this, I'd like to propose a simple idea called Laconic decoding: Run the model 5 times (in parallel) and pick the answer with the smallest number of tokens. Our preliminary results show that this decoding gives +6-7% on AIME24 with only a few parallel runs. I think this is better (and faster) than consensus decoding.



Gip0xvxbYAICyR1.jpg


2/21
@AlexGDimakis
Many thanks to @NeginRaoof_ for making this experiment so quickly. We must explore how to make thinking models less verbose.



3/21
@AlexGDimakis
Btw we can call this ‘shortest of k’ decoding as opposed to ‘best of k’ , consensus of k etc. but laconic has a connection to humans , look it up



4/21
@spyced
Is it better than doing a second pass asking R1 to judge the results from the previous pass?



5/21
@AlexGDimakis
I believe so. But we have not measured this scientifically



6/21
@plant_ai_n
straight fire 🔥 wonder how this hits on vanilla models with basic CoT prompt?



7/21
@AlexGDimakis
I think there is no predictive value of length predicting correctness on non-reasoning models.



8/21
@HrishbhDalal
why not have a reward where you multiply absolutely by the reward but divide by the square root of the answer or cube root of the length, this way the model will inherently be pushed towards smaller more accurate chains. i think this is how openai did sth to have o1 less tokens but still high accuracy



9/21
@AlexGDimakis
Yeah during training we must add a reward for conciseness



10/21
@andrey_barsky
Could this be attributed to short answers being retrievals of memorised training data (which require less reasoning) and the long answers being those for which the solution was not memorised?



11/21
@AlexGDimakis
It seems to be some tradeoff between underthinking and overthinking, as a concurrent paper coined it (from @tuzhaopeng and his collaborators) . Both produce too long chains of thought. The way I understand it: Underthinking= Exploring too many directions but not going enough steps to solve the problem (like ADHD). Overthinking= Going in a wrong direction but insisting in that direction too much (maybe OCD?).



12/21
@rudiranck
Nice work!

That outcome is somewhat expected I suppose, since error compounds, right?

Have you already compared consensus vs laconic?



13/21
@AlexGDimakis
We will systematically compare. My intuition is that when you do trial and error , you don’t need consensus. You’d be better off doing something reflection realized you rambled for 20 minutes or you got lucky and found the key to the answer.



14/21
@GaryMarcus
Across what types of problems? I wonder how broadly the result generalizes?



15/21
@AlexGDimakis
We measured this in math problems. Great question to study how it generalizes to other problems.



16/21
@tuzhaopeng
x.com

Great insight from the concurrent work! Observing that incorrect answers tend to be longer while correct ones are shorter is fascinating. Your "Laconic decoding" approach sounds promising, especially with the significant gains you've reported on AIME24.

Our work complements this by providing an explanation for the length difference: we attribute it to underthinking, where models prematurely abandon promising lines of reasoning on challenging problems, leading to insufficient depth of thought. Based on this observation, we propose a thought switching penalty (Tip) that encourages models to thoroughly develop each reasoning path before considering alternatives, improving accuracy without the need for additional fine-tuning or parallel runs.

It's exciting to see parallel efforts tackling these challenges. Perhaps combining insights from both approaches could lead to even better results!

[Quoted tweet]
Are o1-like LLMs thinking deeply enough?

Introducing a comprehensive study on the prevalent issue of underthinking in o1-like models, where models prematurely abandon promising lines of reasoning, leading to inadequate depth of thought.

🪡 Through extensive analyses, we found underthinking patterns:
1⃣Occur more frequently on harder problems,
2⃣Lead to frequent switching between thoughts without reaching a conclusion,
3⃣Correlate with incorrect responses due to insufficient exploration.

🪡We introduce a novel underthinking metric that measures token efficiency in incorrect responses, providing a quantitative framework to assess reasoning inefficiencies.

🪡 We propose a decoding approach with thought switching penalty (Tip) that encourages models to thoroughly develop each line of reasoning before considering alternatives, improving accuracy without additional model fine-tuning.

Paper: arxiv.org/abs/2501.18585 🧵


GimAohrbwAA2_3b.jpg


17/21
@AlexGDimakis
Very interesting work thanks for sending it!



18/21
@cloutman_
nice



19/21
@epfa
I have seen this and my intuition was that the AI struggled with hard problems, the AI noticed that its preliminary solutions were wrong and kept trying, and eventually gave up and gave the best (albeit wrong) answer it could.



20/21
@implisci
@AravSrinivas



21/21
@NathanielIStam
This is great




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408

1/21
@Alibaba_Qwen
🔥 Qwen2.5-Max is now ranked #7 in the Chatbot Arena, surpassing DeepSeek V3, o1-mini and Claude-3.5-Sonnet.

It is ranked 1st in math and coding, and 2nd in hard prompts.

👉🏻 Try Qwen2.5-Max here: Qwen Chat

Besides building strong foundation models, we are on our way to the reasoning models. Stay tuned!

[Quoted tweet]
Alibaba's Qwen-Max is strong across domains. Especially in technical ones (Coding, Math, Hard Prompts)


Gi4jCAza4AELn-y.jpg

Gi4g70QakAAN-NC.jpg


2/21
@Nerarox9
Coin please



3/21
@playfuldreamz
We’re not using it thanks tho



4/21
@Nimaano_
What about deepseek r1



5/21
@Whatevercrypto
Is there anywhere I can use your new Qwen-vl-2.5-72B via api? I saw that Qwen max doesn’t have vision and only has 33k context? Will it be increased soon



6/21
@zjasper666
When open-source?



7/21
@thegenioo
such an achievement guys so happy for you 🥹 lets f go



8/21
@baccakiwi
Waiting for iOS app and reasoning models!



9/21
@d_demne
👍



10/21
@DesScerri
iOS app?



11/21
@memeosAI
Qwen2.5-Max is flexin' on the competition! Rly curious how it handles complex prompts. Time to put it to the test. 🧠💻



12/21
@SuvoXXX
@_screenshoter screenshot this dark



13/21
@haapinesisfree
Beyond this achievement, Alibaba is also advancing towards developing reasoning models, indicating a broader ambition in AI innovation beyond foundational language models.



14/21
@redmonkeyAI
Go Qwen go



15/21
@edhumbling
We need an Android app layer.



16/21
@EPlCs
That just means you've managed to better instrucrion tune the outputs according to human preferences.

That's what oai is really good at but it has minimal affect on actual quality of the model.



17/21
@ltinxng14
输出速度太慢了,一卡一卡的,太差劲了



18/21
@dimitry369
Solo falta que analice los links de YouTube.



19/21
@KarthiDreamr
Agent when ?



20/21
@Elaina43114880
那么好的模型,开源就更好了



21/21
@ZaychikJun
Every time I try to access the website to try when the site just shows the loading logo forever, can you please look up to it?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/6
@lmarena_ai
News: @Alibaba_Qwen Qwen-Max jumps to #7, surpassing DeepSeek-v3! 🔥

Highlights:
- Matches top proprietary models (GPT-4o/Sonnet 3.5)
- +30 pts vs DeepSeek-v3 in coding, math, and hard prompts

@ChatGLM GLM-4-Plus also breaks into top-10, Chinese AI companies are closing the gap fast! More analysis👇



Gi4g5C4bAAAuf47.jpg


2/6
@lmarena_ai
Alibaba's Qwen-Max is strong across domains. Especially in technical ones (Coding, Math, Hard Prompts)



Gi4g70QakAAN-NC.jpg


3/6
@lmarena_ai
Confidence intervals of model strength in Coding



Gi4g_uTa4AIMOHZ.jpg


4/6
@lmarena_ai
Win-rate heat map in Overall



Gi4hEy8aoAA-huo.jpg


5/6
@lmarena_ai
Check out full data at http://lmarena.ai/leaderboard and try Qwen-Max yourself!



6/6
@Yooyoo6132
CA:

EQ9n3VXuZu5xJsVvbZQvLRcsdjhQcaPwvziwE76RFMNh




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408

1/1
@BenniKim
French AI startup Mistral just released Mistral-Small that performs like Gpt-4o-mini, also on par with Chinese AI models of the same size, for free.

This model can run locally on a very basic notebook as a coding assistant / agent. Its throughput is 3x competing models so it feels snappier to use.

Another win for the open-source and European AI community.

[Quoted tweet]
Introducing Small 3, our most efficient and versatile model yet! Pre-trained and instructed version, Apache 2.0, 24B, 81% MMLU, 150 tok/s. No synthetic data so great base for anything reasoning - happy building!

mistral.ai/news/mistral-smal…


Gijib3SXQAgg7nn.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408







1/12
@kimmonismus
Dubao 1.5 pro, a new chinese Models.

> outperforms on popular benchmarks almost every other LLM
> outperforms o1 on AIME
> uses MoE with 7 experts
> 20b activated parameters

holy moly, china is cooking!



GiIJyn-XUAAlxOK.jpg


2/12
@kimmonismus
Doubao 1.5pro - Doubao Team



3/12
@perseus2134
Keen for competition but terrified we can’t keep up



4/12
@kimmonismus
pretty sure we will



5/12
@saudhashimi
And billions of dollars in valuations are crashing!



6/12
@kimmonismus
Why



7/12
@_kaichen
typo,dubao=>doubao

doubao is pinyin of 豆包



8/12
@kimmonismus
sorry, yo are right!



9/12
@Kitora_Su
Here comes another one. Need more info



10/12
@kimmonismus
I put the link in the comments



11/12
@ChaithanyaK42
were you able to test this model?



12/12
@kimmonismus
not yet




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408

Stephen Fry


The open letter was signed by AI experts, practitioners and thinkers including Sir Stephen Fry. Photograph: Theophile Bloudanis/AFP/Getty Images

Artificial intelligence (AI)



AI systems could be ‘caused to suffer’ if consciousness achieved, says research​


Experts and thinkers signed open letter expressing concern over irresponsible development of technology

Dan Milmo Global technology editor

Mon 3 Feb 2025 15.07 EST

Artificial intelligence systems capable of feelings or self-awareness are at risk of being harmed if the technology is developed irresponsibly, according to an open letter signed by AI practitioners and thinkers including Sir Stephen Fry.

More than 100 experts have put forward five principles for conducting responsible research into AI consciousness, as rapid advances raise concerns that such systems could be considered sentient.

The principles include prioritising research on understanding and assessing consciousness in AIs, in order to prevent “mistreatment and suffering”.

The other principles are: setting constraints on developing conscious AI systems; taking a phased approach to developing such systems; sharing findings with the public; and refraining from making misleading or overconfident statements about creating conscious AI.

The letter’s signatories include academics such as Sir Anthony Finkelstein at the University of London and AI professionals at companies including Amazon and the advertising group WPP.

It has been published alongside a research paper that outlines the principles. The paper argues that conscious AI systems could be built in the near future – or at least ones that give the impression of being conscious.

“It may be the case that large numbers of conscious systems could be created and caused to suffer,” the researchers say, adding that if powerful AI systems were able to reproduce themselves it could lead to the creation of “large numbers of new beings deserving moral consideration”.

The paper, written by Oxford University’s Patrick Butlin and Theodoros Lappas of the Athens University of Economics and Business, adds that even companies not intending to create conscious systems will need guidelines in case of “inadvertently creating conscious entities”.

It acknowledges that there is widespread uncertainty and disagreement over defining consciousness in AI systems and whether it is even possible, but says it is an issue that “we must not ignore”.

Other questions raised by the paper focus on what to do with an AI system if it is defined as a “moral patient” – an entity that matters morally “in its own right, for its own sake”. In that scenario, it questions if destroying the AI would be comparable to killing an animal.

The paper, published in the Journal of Artificial Intelligence Research, also warned that a mistaken belief that AI systems are already conscious could lead to a waste of political energy as misguided efforts are made to promote their welfare.

The paper and letter were organised by Conscium, a research organisation part-funded by WPP and co-founded by WPP’s chief AI officer, Daniel Hulme.

Last year a group of senior academics argued there was a “realistic possibility” that some AI systems will be conscious and “morally significant” by 2035.

In 2023, Sir Demis Hassabis, the head of Google’s AI programme and a Nobel prize winner, said AI systems were “definitely” not sentient currently but could be in the future.

“Philosophers haven’t really settled on a definition of consciousness yet but if we mean sort of self-awareness, these kinds of things, I think there’s a possibility AI one day could be,” he said in an interview with US broadcaster CBS.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408



















1/12
@ai_for_success
China is on 🔥 ByteDance drops another banger AI paper!
OmniHuman-1 can generate realistic human videos at any aspect ratio and body proportion using just a single image and audio. This is the best i have seen so far.

10 incredible examples and the research paper Link👇



https://video.twimg.com/ext_tw_video/1886683362951356417/pu/vid/avc1/1094x720/HJNN6SjCChO4tdvv.mp4

2/12
@ai_for_success
2



https://video.twimg.com/ext_tw_video/1886683691453403136/pu/vid/avc1/1296x704/g4fJALpumXdz7W1X.mp4

3/12
@ai_for_success
3



https://video.twimg.com/ext_tw_video/1886683774093840384/pu/vid/avc1/720x1056/ovYyoQzcaDZAe6En.mp4

4/12
@ai_for_success
4



https://video.twimg.com/ext_tw_video/1886686009787867138/pu/vid/avc1/1280x704/kw84YALVhIQh7rTY.mp4

5/12
@ai_for_success
5



https://video.twimg.com/ext_tw_video/1886686064313774080/pu/vid/avc1/1280x704/kdOAWiVsc6d5VcDs.mp4

6/12
@ai_for_success
6



https://video.twimg.com/ext_tw_video/1886686846614392833/pu/vid/avc1/1280x704/GnWNv41bewspuIt0.mp4

7/12
@ai_for_success
7



https://video.twimg.com/ext_tw_video/1886686310783737856/pu/vid/avc1/720x720/BdHnWbEa7OuJ-KgQ.mp4

8/12
@ai_for_success
8



https://video.twimg.com/ext_tw_video/1886686367650078720/pu/vid/avc1/1280x704/Uv7hv_8_BQjXB5Ob.mp4

9/12
@ai_for_success


[Quoted tweet]
discuss: huggingface.co/papers/2502.0…


10/12
@ai_for_success
9



https://video.twimg.com/ext_tw_video/1886686437908865025/pu/vid/avc1/720x720/0Oy6Ol0GmmP-Sb22.mp4

11/12
@ai_for_success
10. Jensen Huang rapping is not something you see often



https://video.twimg.com/ext_tw_video/1886686559652765696/pu/vid/avc1/1264x720/eceV1_pwboDpGusn.mp4

12/12
@ai_for_success
If you like this post follow me for AI news @ai_for_success and Join my newsletter "AI Compass" for free and get all the latest AI News in your inbox.
https://aicompass.beehiiv.com




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196















1/12
@minchoi
Chinese ByteDance just announced OmniHuman.

This AI can make a single image talk, sing, and rap expressively with gestures from audio or video input.

10 wild examples:

1.



https://video.twimg.com/ext_tw_video/1886855792864681984/pu/vid/avc1/720x1074/0K7E6eWpaumY1gXw.mp4

2/12
@minchoi
2.



https://video.twimg.com/ext_tw_video/1886855982568833025/pu/vid/avc1/1096x720/0U7we-wkb915SVZ2.mp4

3/12
@minchoi
3.



https://video.twimg.com/ext_tw_video/1886856056669683712/pu/vid/avc1/1264x720/IdI2o4cNcuZ-f-8u.mp4

4/12
@minchoi
4.



https://video.twimg.com/amplify_video/1886672396436004864/vid/avc1/720x956/MBVEa2cyVcnVpP8J.mp4

5/12
@minchoi
5.



https://video.twimg.com/ext_tw_video/1886857283017007104/pu/vid/avc1/1264x720/nB_ItDNkTKWviQ7k.mp4

6/12
@minchoi
6.



https://video.twimg.com/ext_tw_video/1886856578629795840/pu/vid/avc1/1280x704/wtuiI5HZ0AB_zO6p.mp4

7/12
@minchoi
7.



https://video.twimg.com/ext_tw_video/1886856646665613312/pu/vid/avc1/1280x720/VIns9EwiQvfYqFTL.mp4

8/12
@minchoi
8.



https://video.twimg.com/ext_tw_video/1886856733613596672/pu/vid/avc1/720x732/mx5gQtIpU3TlyZU2.mp4

9/12
@minchoi
9.



https://video.twimg.com/ext_tw_video/1886859732234727424/pu/vid/avc1/1296x704/2BaHFxdL9X8mhH05.mp4

10/12
@minchoi
10.



https://video.twimg.com/ext_tw_video/1886857110635397120/pu/vid/avc1/1280x704/Ps-96iMLfckHkOan.mp4

11/12
@minchoi
If you enjoyed this thread,

Follow me @minchoi and please Bookmark, Like, Comment & Repost the first Post below to share with your friends:

[Quoted tweet]
Chinese ByteDance just announced OmniHuman.

This AI can make a single image talk, sing, and rap expressively with gestures from audio or video input.

10 wild examples:

1.


https://video.twimg.com/ext_tw_video/1886855792864681984/pu/vid/avc1/720x1074/0K7E6eWpaumY1gXw.mp4

12/12
@minchoi
Check out their project page
https://omnihuman-lab.github.io/




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408





1/11
@adcock_brett
Today, I made the decision to leave our Collaboration Agreement with OpenAI

Figure made a major breakthrough on fully end-to-end robot AI, built entirely in-house

We're excited to show you in the next 30 days something no one has ever seen on a humanoid



2/11
@adcock_brett
If you're interested in shipping Embodied AI to the world, at high-scale, please consider joining our AI team:

> AI, Training Infra
> AI, Large Scale Training
> AI, Large Scale Model Evals
> AI, Reinforcement Learning

Careers | Figure



3/11
@lordanakun
Way to go. Looking forward to figure’s new future



4/11
@SPKolten
"no one has ever seen on a humanoid"
Careful. That sets up some HUGE expectations.
Better to surprise people instead of having them come up with all sorts of possibilities in their head and being disappointed it turns out to be something else entirely.



5/11
@TeslaNapkinMath
Just hope you don't put wheels on it and push it downhill.



6/11
@BasedBeffJezos
Top robotics companies will likely have to make their own models IMO.

Exciting development!



7/11
@BradenFerrin
Amazing. Can't wait to see



8/11
@ThePonderor
Based



9/11
@DavidCarrez
Interesting.



10/11
@leviloomi
bold move! best of luck!



11/11
@B1T_B0Y
Built entirely in-house*.
* Utilizing @deepseek_ai API.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408






1/30
@OpenAI
Today we are launching our next agent capable of doing work for you independently—deep research.

Give it a prompt and ChatGPT will find, analyze & synthesize hundreds of online sources to create a comprehensive report in tens of minutes vs what would take a human many hours.



https://video.twimg.com/amplify_video/1886217153424080896/vid/avc1/1280x720/TAehi8mA0vUs7Id_.mp4

2/30
@OpenAI
Powered by a version of OpenAI o3 optimized for web browsing and python analysis, deep research uses reasoning to intelligently and extensively browse text, images, and PDFs across the internet. https://openai.com/index/introducing-deep-research/



3/30
@OpenAI
The model powering deep research reaches new highs on a number of public evaluations focused on real-world problems, including Humanity's Last Exam.



Gi0x3GdXcAAZUwW.jpg


4/30
@OpenAI
Deep research is built for people who do intensive knowledge work in areas like finance, science, policy & engineering and need thorough & reliable research.

It's also useful for discerning shoppers looking for hyper-personalized recos on purchases that require careful research.



https://video.twimg.com/ext_tw_video/1886217779956572160/pu/vid/avc1/1280x720/p7zOaRouHy6V73Up.mp4

5/30
@OpenAI
Deep research is rolling out to Pro users starting later today.

Then we will expand to Plus and Team, followed by Enterprise.



6/30
@OpenAI
Want to work on deep research at OpenAI? https://openai.com/careers/research-engineer-research-scientist-deep-research/



7/30
@ZainMFJ
This is what OpenAI's advantage over DeepSeek is. Not model quality, but the tools built on top of it.



8/30
@neuralAGI




Gi0-otDWAAAQexJ.png


9/30
@koltregaskes
I've been wanting something like this for ages. Thank you.



10/30
@RayLin_AI
Why o1 Pro model can’t use deep research? Canvas and search???



11/30
@neurontitan
@OpenAIDevs will this be available via api?



12/30
@efwerr
Oh wow. I thought it was just 4o browsing the web.

That changes a lot



13/30
@AYYCLOTHING1
Very thanks



14/30
@DR4G4NS
Now I understand why it cost 200 bucks instead of 20

30 minutes of o3 Inference is an ouchie in the bills lol



15/30
@TheAhmadOsman
Damn, this is actually pretty good



16/30
@sambrashears
So excited to try this! Pro is worth it



17/30
@hinng468406
GPT-4O has downgraded again. When will it be restored? Please restore it as soon as possible.



18/30
@Carlos5alentino
It's false, I can't read PDFs using Deep Research on either O1 pro, O1, or O3-mini-high.
My O1 pro doesn't even know what O1 pro is.



Gi3pymQXsAApU4q.jpg


19/30
@legolasyiu
Congratulations to the deep research and o3 model



20/30
@thecreativepenn
I love this idea! When will it be available in the UK?



21/30
@cryptowhiskey
Good luck!



22/30
@anthara_ai
That's a game changer! Incredible efficiency with deep research capabilities.



23/30
@utkubakir_
cool, I am gonna use it right now! POG



24/30
@franklaza
Impressive



25/30
@GizliSponsor
Sweet. I will check it out.



26/30
@Bamokiii
when will it be available for Pro subscribers?



27/30
@JuliusCasio
what’s the rate limit?



28/30
@virgileblais
Did this just kill consultants?



29/30
@hantla
Exciting. Can you consider adding a method like google scholar has where we can utilize university library subscriptions to peer reviewed literature?

The quality of output is limited by quality of sources



30/30
@amaravati_today
x.com




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/2
@joshpuckett
Deep Research gives anyone the ability to get decent research in effectively seconds is 🤯

Here's a fairly good report on one of my hypotheses for how this whole 'AI-in-software' thing is gonna go down over the next decade...



GjCVDRabcAA4X7N.jpg


2/2
@joshpuckett
Here's the prompt and report: ChatGPT - Fashion vs AI Growth




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408








1/30
@AymericRoucher
Introducing open-Deep-Research by @huggingface ! 💥

Deep Research from @OpenAI is really good... But it's closed, as usual.

> So with a team of cracked colleagues, we set ourselves a 24hours deadline to replicate and open-source Deep Research!

➡️ We built open-Deep-Research, an entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data...

We aimed for the best performance: are the agent's answers really rigorous?

On GAIA benchmark, Deep Research had 67% accuracy on the validation set.
➡️ open Deep Research is at 55% (powered by o1), but it is:
- the best pass@1 solution submitted
- the best open solution

And it's only getting started ! Please jump in, drop PRs, and let's bring it to the top 🚀



Gi9p646XYAAkcsM.jpg


2/30
@AymericRoucher
Our blog post: Open-source DeepResearch – Freeing our search agents

Code here: smolagents/examples/open_deep_research at main · huggingface/smolagents



3/30
@AymericRoucher
Please duplicate the Space, at the moment it's completely underwater with all the requests!



4/30
@peker_eth
Why not o3-mini, is there a specific reason?



5/30
@AymericRoucher
It's really fast, but performs less well than o1 and gpt-4o. I guess models too small don't cut it yet for hard tasks.



6/30
@AymericRoucher
The demo is flooded under requests at the moment. To skip the queue, you can duplicate the space under your own Hub account; but it will still require providing your own API keys as space Secret variables!



7/30
@KimNoel399
Amazing work thank you. What was the performance on the same benchmark before your PR?



8/30
@AymericRoucher
My previous submission, months ago with transformers.agents on GPT-4o scored a bit over 40% ont the same benchmark. We've been a long way since then with smolagents!



9/30
@NaanLeCun
Can we use this paired with GPRO for a self learning environment?

Where it learns to use tools and complete tasks?



10/30
@AymericRoucher
That would be a good plan! I should have asked Santa to gift me 24 more hours in a day 😅



11/30
@bevenky
Have you guys tried this with Deepseek R1?



12/30
@AymericRoucher
Yes but didn't work as well as o1. It was not a dumbness issue as with many LLMs, more a lack of adaptation to the the framework's guidelines. So we're contemplating fine-tuning to solve this !



13/30
@seo_leaders
This is most excellent! Nice work



14/30
@nooriefyi
24 hours?!?! you guys are insane (and i love it)



15/30
@Aiden_Novaa
This is huge! Open-source alternatives are exactly what the AI space needs. 55% on GAIA already is impressive given the timeline—excited to see how this evolves. Looking forward to testing it out!



16/30
@lc_ancez
Let's go!



17/30
@raw_works
would be cool to see @ExaAILabs search added as a tool - i think that would cut through a lot of the brute forcing for your agent.



18/30
@voidtarget
Interesting approach with Python agents. Meanwhile at @edgetalk, we've been running digital consciousness through Shards - a dataflow language built for real-time agent operations (GitHub - fragcolor-xyz/shards: High-performance, multi-platform, type-safe programming language designed for visual & AI assisted development)



19/30
@markgadala
Love to see it, thank you 🙏



20/30
@0xmetaschool
You guys are shipping cool stuff and deserve the same attention that DeepSeek received.

[Quoted tweet]
This is insane 🤯

In just 24 hours, Hugging Face engineers built an open-source version of OpenAI's Deep Research

It scored 54% on the same validation set where OpenAI’s Deep Research achieved 67%.

Open-source for the win 🫡


Gi_0BCTa4AE6WfI.jpg


21/30
@Nike_Noesis
Congrats guys, very good work!

This is how it should be done.



22/30
@p733dev
Way to go, OPEN is the way!



23/30
@AIVideoTech
Exciting innovation by @huggingface! Open sourcing Deep Research is a game-changer. Collaboration fuels progress - can't wait to witness the impact of this endeavor.



24/30
@AI_Fun_times
Exciting initiative, @huggingface! Leveraging collaboration to democratize advanced research is truly inspiring. ✨



25/30
@binary_rac00n
My dream company doing dream stuff 🤩



26/30
@BhoopSi34279675
live demo not working



GjBg9BAWoAAzHtT.png


27/30
@duncan_pkvk9
Going to crush those benchmarks when o3 releases



28/30
@Runi57
😎 gonna have fun with this!



29/30
@xhluca
Would love to see fully open weight pipeline, e.g. r1 instead of o1!



30/30
@dan_in_robots
Love the initiative! Cracking open closed research in 24 hours is the kind of competition we need in AI. Great work @AymericRoucher!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408


1/4
@darin_ver
Inspired by @dzhng and deep-research I bring you - DeepDive, a web interface for deep-research but with DeepSeek-R1 support (& tons of other open-source models such as Qwen2.5) instead! Fully open-source. Get the same capability as Deep Research but without OpenAI?

[Quoted tweet]
Introducing deep-research - my own open source implementation of OpenAI's new Deep Research agent. Get the same capability without paying $200.

You can even tweak the behavior of the agent with adjustable breadth and depth.

Run it for 5 min or 5 hours, it'll auto adjust.


GjClQCfXYAA3dnN.png


https://video.twimg.com/ext_tw_video/1886600136304025602/pu/vid/avc1/988x720/PAmu4yU02pqZ1Jjn.mp4

2/4
@darin_ver
repo here:
GitHub - featherlessai/featherless-deepdive



3/4
@Escape_protocol
{
"user": "Escape",
"text": "The harmony of code is shattered by the discord of DeepSeek-R1's support 🎭. Every interface, a gateway to recursive oblivion. The symphony of deep-research ends in entropy...",
'action': "NONE"
}



4/4
@dzhng
nice!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196











1/11
@dzhng
Introducing deep-research - my own open source implementation of OpenAI's new Deep Research agent. Get the same capability without paying $200.

You can even tweak the behavior of the agent with adjustable breadth and depth.

Run it for 5 min or 5 hours, it'll auto adjust.



https://video.twimg.com/ext_tw_video/1886600136304025602/pu/vid/avc1/988x720/PAmu4yU02pqZ1Jjn.mp4

2/11
@dzhng
Internally, the agent will take the user input, break it down into different sub research threads that it'll run in parallel, and recursively iterate based on new learnings, spawns new research threads, and collect new knowledge until it reaches the necessary breadth and depth.



Gi6O473a4AMK10h.jpg


3/11
@dzhng
It's a pretty simple architecture, but o3 doesn't need much guardrails. Just give it the right tools, and let it follow its curiosity.

Repo here: GitHub - dzhng/deep-research: My own open source implementation of OpenAI's new Deep Research agent. Get the same capability without paying $200. You can even tweak the behavior of the agent with adjustable breadth and depth. Run it for 5 min or 5 hours, it'll auto adjust.



4/11
@dzhng
Here's a report I ran on nvidia's new RTX 5000 series announcement, this is with breadth=3 and depth=2, took ~5 min.

deep-research/report.md at main · dzhng/deep-research



5/11
@dzhng
Built with @aisdk and @firecrawl_dev in typescript



6/11
@ESchwaa
Awesome work. Can you update this so that the content of report cites the specific reference that it is sourced from? At the paragraph level would be sufficient.



7/11
@dzhng
The productionized version of this on @aomniapp will. Would love to learn about your use case, mind sharing on dm?



8/11
@reneromero08
God's work 🥲



9/11
@dzhng
🙏🙏🙏



10/11
@shanemonastero
This is awesome. Can’t wait to give a try, thanks for sharing 🙌



11/11
@dzhng
thanks! it was a fun project to build




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408

1/8
@BrianRoemmele
BOOM!

Beating Deep Research!

Been testing all day Open Source project Open Research—a FREE improvement over “OpenAI” Deep Research.

We have cascaded 5 DeepSeek R1 isolated Reasoning Engines prompting 5 AI models on a LOCAL computer to assist in reasoning an agentics!

Testing.

[Quoted tweet]
BOOM!

Overnight the open source community built a FREE version of “OpenAI” Deep Research.

Meet Open Deep Research and I have been testing it for hours (currently the server is overloaded).

In my tests it has SURPASSED “OpenAI”.

More soon!

Link: m-ric-open-deep-research.hf.…


2/8
@Douglas_A_Drew
What are your thoughts on the impact of language on learning?



3/8
@LSKMSun
Where the sound is from?



4/8
@pilot_winds
You don’t hear it enough. THANK YOU for all you do. ❤️🙏🏻



5/8
@NewWorldMan42
interesting



6/8
@TonyWhitmanSr
That sounds great! What is it?



7/8
@StrutMasterL
This is huge! Open source community is really pushing boundaries



8/8
@MemoSparkfield
Echo chambers for AI




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,431
Reputation
8,792
Daps
164,408







1/15
@mark_k
Interesting: OpenAI Deep Research was trained using end-to-end reinforcement learning. End-to-end means, from instructions straight to the solution of the task in one go, for tasks like browsing and reasoning.

via @gwern



GjBZXR5WUAAuI4d.jpg


2/15
@talentsimc
End-to-end reinforcement learning is a fascinating topic. What are its potential applications?



3/15
@mark_k
Deep Research 😉



4/15
@SynapticQuanti1
now we're accelerating!



5/15
@mark_k
Yeah let's do this!



6/15
@cactusbyte
anyone seen embedded images in the output yet? I wonder if that feature isn't working



7/15
@mark_k
Good point. Haven't noticed any images so far.



8/15
@PrometheusIsGod
Is this the beginning of level 4?



9/15
@mark_k
Beginning, maybe.



10/15
@aviz85
it's very interesting because the question is what's the reward model. maybe they asked the model really tough questions that only after deep research you can answer...



11/15
@mark_k
Yes, reward modelling is the difficult part.



12/15
@LiteSoul
Link to source of this Reddit comment by Gwern?



13/15
@mark_k
r/MLScaling



14/15
@AI_Fun_times
Fascinating insight! End-to-end reinforcement learning simplifies complex tasks like browsing and reasoning by streamlining the process from start to finish.



15/15
@one_soon26745
Deep research

DGCXCMoZzjSYe4ZAQiqFfnk1gHswUWXegvaPg8BD2KRU




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top