bnew

Veteran
Joined
Nov 1, 2015
Messages
59,158
Reputation
8,772
Daps
163,783

1/1
@MunchBaby1337
🚀 **ByteDance Unveils Doubao-1.5-pro** 🚀 /search?q=#db

- **Deep Thinking Mode**: Surpasses O1-preview and O1 on the AIME benchmark.

- **Benchmark Beast**: Outperforms deepseek-v3, gpt4o, and llama3.1-405B across multiple benchmarks.

- **MoE Magic**: Utilizes a Mixture of Experts architecture, with significantly fewer active parameters than competitors.

- **Performance Leverage**: Achieves dense model performance with just 1/7 of the parameters (20B active = 140B dense equivalent).

- **Tech Talk**: Employs a heterogeneous system design for prefill-decode and attention-FFN, optimizing throughput with low latency.



GiO8OXzWEAA9re4.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


ByteDance AI Introduces Doubao-1.5-Pro Language Model with a ‘Deep Thinking’ Mode and Matches GPT 4o and Claude 3.5 Sonnet Benchmarks at 50x Cheaper​


By

Asif Razzaq

-

January 25, 2025

The artificial intelligence (AI) landscape is evolving rapidly, but this growth is accompanied by significant challenges. High costs of developing and deploying large-scale AI models and the difficulty of achieving reliable reasoning capabilities are central issues. Models like OpenAI’s GPT-4 and Anthropic’s Claude have pushed the boundaries of AI, but their resource-intensive architectures often make them inaccessible to many organizations. Additionally, addressing long-context understanding and balancing computational efficiency with accuracy remain unresolved challenges. These barriers highlight the need for solutions that are both cost-effective and accessible without sacrificing performance.

To address these challenges, ByteDance has introduced Doubao-1.5-pro, an AI model equipped with a “Deep Thinking” mode. The model demonstrates performance on par with established competitors like GPT-4o and Claude 3.5 Sonnet while being significantly more cost-effective. Its pricing stands out, with $0.022 per million cached input tokens, $0.11 per million input tokens, and $0.275 per million output tokens. Beyond affordability, Doubao-1.5-pro outperforms models such as deepseek-v3 and llama3.1-405B on key benchmarks, including the AIME test. This development is part of ByteDance’s broader efforts to make advanced AI capabilities more accessible, reflecting a growing emphasis on cost-effective innovation in the AI industry.

Screenshot-2025-01-25-at-7.50.30 PM-1-1024x704.png


Screenshot-2025-01-25-at-7.50.48 PM-1-1024x726.png


Technical Highlights and Benefits


Doubao-1.5-pro’s strong performance is underpinned by its thoughtful design and architecture. The model employs a sparse Mixture-of-Experts (MoE) framework, which activates only a subset of its parameters during inference. This approach allows it to deliver the performance of a dense model with only a fraction of the computational load. For instance, 20 billion activated parameters in Doubao-1.5-pro equate to the performance of a 140-billion-parameter dense model. This efficiency reduces operational costs and enhances scalability.

The model also integrates a heterogeneous system design for prefill-decode and attention-FFN tasks, optimizing throughput and minimizing latency. Additionally, its extended context windows of 32,000 to 256,000 tokens enable it to process long-form text more effectively, making it a valuable tool for applications like legal document analysis, academic research, and customer service.

Screenshot-2025-01-25-at-7.46.05 PM-1-1024x651.png


Results and Insights


Performance data highlights Doubao-1.5-pro’s competitiveness in the AI landscape. It matches GPT-4o in reasoning tasks and surpasses earlier models, including O1-preview and O1, on benchmarks like AIME. Its cost efficiency is another significant advantage, with operational expenses 5x lower than DeepSeek and over 200x lower than OpenAI’s O1 model. These factors underscore ByteDance’s ability to offer a model that combines strong performance with affordability.

Early users have noted the effectiveness of the “Deep Thinking” mode, which enhances reasoning capabilities and proves valuable for tasks requiring complex problem-solving. This combination of technical innovation and cost-conscious design positions Doubao-1.5-pro as a practical solution for a range of industries.

Screenshot-2025-01-25-at-7.51.12 PM-1-1024x714.png


Conclusion


Doubao-1.5-pro exemplifies a balanced approach to addressing the challenges in AI development, offering a combination of performance, cost efficiency, and accessibility. Its sparse Mixture-of-Experts architecture and efficient system design provide a compelling alternative to more resource-intensive models like GPT-4 and Claude. By prioritizing affordability and usability, ByteDance’s latest model contributes to making advanced AI tools more widely available. This marks an important step forward in AI development, reflecting a broader shift towards creating solutions that meet the needs of diverse users and organizations.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,158
Reputation
8,772
Daps
163,783








1/12
@Saboo_Shubham_
Qwen2.5 Max is a new large-scale MoE model from China that outperforms DeepSeek v3, Claude Sonnet 3.5, GPT-4o and Llama-3 405B.

It is available to use as an OpenAI like API and at much less cost.

Everyday in AI is now about China. Let that sink in.



GiZJVqtXoAAYYAv.jpg


2/12
@Saboo_Shubham_
I will be adding more AI Agent apps using Qwen2.5 Max in the future.

You can find all the awesome LLM Apps with AI Agents and RAG in the following Github Repo.

P.S: Don't forget to star the repo to show your support 🌟

GitHub - Shubhamsaboo/awesome-llm-apps: Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.



3/12
@Saboo_Shubham_
50+ Step-by-step tutorials of LLM apps with AI Agents and RAG.

P.S: Don't forget to subscribe for FREE to access future tutorials.

https://theunwindai.com



GiZKigYWIAQo-6k.png


4/12
@Saboo_Shubham_
If you find this useful, RT to share it with your friends.

Don't forget to follow me @Saboo_Shubham_ for more such LLM tips and AI Agent, RAG tutorials.

[Quoted tweet]
Qwen2.5 Max is a new large-scale MoE model from China that outperforms DeepSeek v3, Claude Sonnet 3.5, GPT-4o and Llama-3 405B.

It is available to use as an OpenAI like API and at much less cost.

Everyday in AI is now about China. Let that sink in.


GiZJVqtXoAAYYAv.jpg


5/12
@KairosDataLabs
Cray week in AI.



6/12
@Saboo_Shubham_
100% agree.



7/12
@Gargi__Gupta
Chinese New Year started with an AI festival



8/12
@Saboo_Shubham_
Its an AI revolution at this point lol



9/12
@AILeaksAndNews
China is accelerating



10/12
@Saboo_Shubham_
Totally at an exponential rate.



11/12
@xdrmsk
In a week, decades are happening!!!



12/12
@Saboo_Shubham_
Those are the right words.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/31
@Alibaba_Qwen
The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes. It achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond.

📖 Blog: Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model
💬 Qwen Chat: Qwen Chat (choose Qwen2.5-Max as the model)
⚙️ API: Make your first API call to Qwen - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center (check the code snippet in the blog)
💻 HF Demo: Qwen2.5 Max Demo - a Hugging Face Space by Qwen

In the future, we not only continue the scaling in pretraining, but also invest in the scaling in RL. We hope that Qwen is able to explore the unknown in the near future! 🔥

💗 Thank you for your support during the past year. See you next year!



GiY7SOebMAAyZ1o.jpg


2/31
@Alibaba_Qwen
Results of base language models. We are confident in the quality of our base models and we expect the next version of Qwen will be much better with our improved post-training methods.



GiY8IVPaMAA-v_D.jpg


3/31
@Alibaba_Qwen
It is interesting to play with this new model. We hope you enjoy the experience in Qwen Chat:

Qwen Chat



https://video.twimg.com/ext_tw_video/1884260770374115329/pu/vid/avc1/1280x720/OU7GghDaR4_gJloI.mp4

4/31
@Alibaba_Qwen
Also, it is available to HF demo, and it is on Any Chat as well!

Qwen2.5 Max Demo - a Hugging Face Space by Qwen



5/31
@Alibaba_Qwen
Welcome to use the API through the service of Alibaba cloud. Using the API is as easy as using any other OpenAI-API compatible service.



GiY84jqakAA0s1f.jpg


6/31
@mkurman88
Looks good 😍



7/31
@securelabsai
V3 or R1?



8/31
@Yuchenj_UW
Happy new year Qwen!



9/31
@raphaelmansuy
Happy new Year of The Snake / From Hong Kong 🇨🇳 🇭🇰



10/31
@Urunthewizard
yoooooo thats cool! Is it open source like deepseek?



11/31
@SynquoteIntern
"Sir, another Chinese model has hit the timeline."



GiZGIH5bUAAIUJI.jpg


12/31
@koltregaskes
Happy New Year and thank you guys.



13/31
@iamfakhrealam
Ahaaa… Happy Lunar Year to you guys and specially to @sama



GiZIeE_WIAAzgt0.png


14/31
@hckinz
Lol, another one and this time they are not even comparing Claude 3.5 on coding 😂🙌



15/31
@octorom
Android app in the works? 🙂



16/31
@Cloudtheboi
Currently using qwen to search websites. It's great!



17/31
@luijait_
We claim a test time scaling GRPO RL over this base model



18/31
@yupiop12
based based based based based waow...



19/31
@AntDX316
Non-stop cooking. 👍



20/31
@marjan_milo
A takedown of everything OpenAI has shown so far.



21/31
@TepuKhan
恭喜发财



22/31
@tom777cruise
butthole logo ✅



23/31
@LuminEthics
Tweet Storm Response: Qwen2.5-Max vs. DeepSeek V3—But Where’s the Accountability? 🚨
1/ Qwen2.5-Max steps into the spotlight!
With benchmarks outpacing DeepSeek V3, it’s clear the MoE (Mixture of Experts) race is heating up.
But as models compete on performance, we need to ask:

What ethical safeguards are in place?

Who ensures transparency and alignment?
/search?q=#AI /search?q=#Governance



24/31
@vedu023
The race just keeps getting more exciting…!!



25/31
@elder_plinius




26/31
@vedangvatsa
Read about Liang Wenfeng, the Chinese entrepreneur behind DeepSeek, the AI App challenging ChatGPT:

[Quoted tweet]
Liang Wenfeng - Founder of DeepSeek

Liang was born in 1985 in Guangdong, China, to a modest family.

His father was a school teacher, and his values of discipline and education greatly influenced Liang.

Liang pursued his studies at Zhejiang University, earning a master’s degree in engineering in 2010.

His research focused on low-cost camera tracking algorithms, showcasing his early interest in practical AI applications.

In 2015, he co-founded High-Flyer, a quantitative hedge fund powered by AI-driven algorithms.

The fund grew rapidly, managing over $100 billion, but he was not content with just the financial success.

He envisioned using AI to solve larger, more impactful problems beyond the finance industry.

In 2023, Liang founded DeepSeek to create cutting-edge AI models for broader use.

Unlike many tech firms, DeepSeek prioritized research and open-source innovation over commercial apps.

Liang hired top PhDs from universities like Peking and Tsinghua, focusing on talent with passion and vision.

To address US chip export restrictions, Liang preemptively secured 10,000 Nvidia GPUs.

This strategic move ensured DeepSeek could compete with global leaders like OpenAI.

DeepSeek's AI models achieved high performance at a fraction of the cost of competitors.

Liang turned down a $10 billion acquisition offer, stating that DeepSeek’s goal was to advance AI, not just profit.

He advocates for originality in China’s tech industry, emphasizing innovation over imitation.

He argued that closed-source technologies only temporarily delay competitors and emphasized the importance of open innovation.

Liang credits his father’s dedication to education for inspiring his persistence and values.

He believes AI should serve humanity broadly, not just the wealthy or elite industries.


GiZfDjQX0AAPkuc.jpg


27/31
@Mira_Network




GiZkpwubsAA9QQy.jpg


28/31
@snats_xyz
any chances of a paper / release of weights or something similar at some point?



29/31
@LechMazur
18.6 on NYT Connections, up from 14.8 for Qwen 2.5 72B. I'll run my other benchmarks later.



GiaCioCW4AAFy8X.jpg


30/31
@daribigboss
Absolutely love this project! Let’s connect , send me a DM now! 💎
https://twitter.com/messages/compos...+joining+forces?+💎&recipient_id=203762854



31/31
@shurensha
Man OpenAI can't catch a break




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,158
Reputation
8,772
Daps
163,783




1/11
@RnaudBertrand
This is pretty hilarious in retrospect.

In India in 2023, Altman was asked how if a small, smart team with a budget of $10 million could build something substantial within AI.

His reply: "It’s totally hopeless to compete with us on training foundation models"

[Quoted tweet]
Sam Altman - founder of OpenAI and ChatGPT - is in India and VCs are asking some tough questions to him


https://video.twimg.com/amplify_video/1666438180323696642/vid/1920x1080/TiS26MpJ4GkCkxu6.mp4

2/11
@minotauronlucy
How many foundation models has India trained since then? Zero.

There is no point bashing Sam Altman. India has not even produced a byte worth of weights to even snide at openAI.



3/11
@RnaudBertrand
Doesn't mean it wasn't possible, as Deepseek demonstrated...



4/11
@terrybythebay
😅🤣

[Quoted tweet]
DeepSeek was able to build their R1 model for only $6M because they bought all their GPUs directly from Temu.


5/11
@TheJesseMK
Do you think DeepSeek had a budget under $10M?



6/11
@1Paul_1
“The fool doth think he is wise, but the wise man knows himself to be a fool”



7/11
@tate_terminal
with a mere $10 million, one could readily purchase a mirror for altman to gaze upon his own hubris, a modern-day icarus flying too close to the silicon sun.



8/11
@AGItechgonewild
True LMAO!! 💀



9/11
@tacobelmin
You missed the part where he said “you should try anyway”!



10/11
@aledeniz
DeepSeek doesn’t have a $10 million budget though 😇
They spend more than that – likely a multiple – just in wages.



11/11
@junyongz
Things would make sense if he added within “2 years”




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,158
Reputation
8,772
Daps
163,783



1/21
@RnaudBertrand
The denial is frankly unreal. They're still pushing for the chips export controls when they now couldn't have a better illustration that it's so self-defeating.

Again, continued decoupling by building walls and barriers means that it's the U.S. that's becoming a closed system. And in tech a closed system eventually loses momentum while an open one gains it.

The U.S. is very much facing its a red/blue pill moment: it can either take the blue pill of comfort - hiding behind walls, bans and comforting anti-China propaganda, all the band-aids that don't address the key issue: the fact that China is increasingly better. Or they can swallow the red pill and try to understand and adapt to the world they now live in. And just like in The Matrix, the longer they wait, the more shocking the eventual awakening becomes.

[Quoted tweet]
Anthropic CEO Dario Amodei says while DeepSeek may be able to smuggle 50,000 H100s, it would be very difficult to smuggle the hundreds of thousands or millions of chips required to continue to compete with American companies in AI


https://video.twimg.com/ext_tw_video/1883974939470094339/pu/vid/avc1/720x720/6ovb9kwRGqirIQVp.mp4

2/21
@RnaudBertrand
And on top of that he's wrong since Deepseek is using Huawei chips for inference 👇 (the development of those chips by Huawei being another direct effect of the export controls and sanctions)

[Quoted tweet]
I feel this should be a much bigger story: DeepSeek has trained on Nvidia H800 but is running inference on the new home Chinese chips made by Huawei, the 910C.


GiXowcPWUAA8Y8V.jpg


3/21
@st_aubrun
😮

[Quoted tweet]
If DeepSeek were a US company it would now have a valuation of about 3 trillion


4/21
@RnaudBertrand
Probably correct



5/21
@deed_deeds
Dario is talking like a ning nong



6/21
@RSA_Observer
Probably too late anyway:

"China's new AI chip outperforms NVIDIA's most powerful GPU A team of researchers from Beijing, led by Professors Fang Lu and Dai Qionghai of Tsinghua University, has unveiled the world's first fully optical artificial intelligence (AI) chip. Named Taichi-II, this groundbreaking innovation marks a significant milestone in the field of optical computing. The chip has outperformed NVIDIA's H100 GPU in terms of energy efficiency and performance."



7/21
@MuhumuzaMaurice
Playing Go (after playing Chess) gives you a sense of how two things can be great and yet different in approach and consequence.

One may argue that the Americans are assessing the Chinese Chess Board and marking themselves right. Meanwhile the Chinese continue to extend their understanding of a superior game which their only potential opponent is even refusing to acknowledge is better in outcome prediction. Simply because it appears to have pedestrian rules of engagement.



8/21
@michaeltanyk
Anthropic is on the first row of firing squad. This guy is shaking. Spectrum people lie badly.



9/21
@awaken_tom
Crazy that Anthropic is pushing "AI safety" and a chip blockade of China, while they themselves are conducting "gain of function" research with malicious AIs and teaching their models to lie and cover up "uncomfortable" truths. What could go wrong?



GiYSmOra4AA0Ah2.jpg


10/21
@chickadeedee3
"Parity" 🤣🤣🤣



11/21
@johann_theron
Pushing realization to the next generation is normal these days, because Americans treat dogs better than children. Check fewer children correlat with more 🐕.



12/21
@Bob72838565
He is a typical clueless American CEO 🤣🤣🤣



13/21
@arscrypta
Singapore just buys more.



14/21
@carismachet
Narcissism is a hell of a drug



15/21
@DottorPav
🙋‍♂️ from 🇮🇹



GiXsZ4lW4AEJgij.jpg


16/21
@Steve90315595
Ultimately the unipolar west will isolate themselves from the Multipolar/ BRICS nations completely given the west's 'my way or the highway' stance on global economics.

If the Anglosphere cannot win the game they WILL flip the game board which makes them an existential threat.



17/21
@Aishalifett
The term ‘free market’ is often used by the West as a facade to mask the manipulative practices it employs. In truth, the West’s so-called free market is far from free. /search?q=#DeepSeekR1



18/21
@amarinica
I think this is normal human behaviour. Difficult to see anyone react in a manner that admits defeat or any personal fault. The play is to get fired and get the comp package, not admit incompetence and resign.



19/21
@hx_dks
He is not even a real scientist or an engineer



20/21
@ethicalzac
Exactly, so the only lesson learned from DeepSeek is to buy more Nvidia chips and blow more hot air into our markets



21/21
@LaniRefiti
What the DeepSeek episode has demonstrated is the old adage of "necessity is the mother of all invention"
Denied advanced chips, the DeepSeek team instead innovated and came up with a really innovative and efficient way to train LLM's at a fraction of the cost.

Plus they made the thing open source!

I'm skeptical on the whole 50,000 H100's given it's open source. Any lab worth it's salt should be able to replicate or disprove what DeepSeek did on general purpose GPU's. Let's see some actual data.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
59,158
Reputation
8,772
Daps
163,783





1/21
@rowancheung
NEWS: DeepSeek just dropped ANOTHER open-source AI model, Janus-Pro-7B.

It's multimodal (can generate images) and beats OpenAI's DALL-E 3 and Stable Diffusion across GenEval and DPG-Bench benchmarks.

This comes on top of all the R1 hype. The 🐋 is cookin'



GiUE4GBXIAACpdw.jpg


2/21
@rowancheung
Link: deepseek-ai/Janus-Pro-7B · Hugging Face



3/21
@rowancheung
For those wondering my quick take on what's happening right now with R1 and Janus

1. GPU demand will not go down
2. OpenAI is not done for, but Open source and China are showing they're far closer than anticipated
3. There's way too much misinfo being spread by mainstream media right now (almost seems on purpose?)
4. DeepSeek open-sourcing R1 is still a huge gift to developers and overall AI progress

I haven't seen this much confusion and uncertainty on my TL for ages...



4/21
@rowancheung
That said, I'm shocked we haven't heard any response from @nvidia or @OpenAI yet



5/21
@rowancheung
Also a reminder that DeepSeek R1 dropped 6 days ago, but the market only reacted today

Wall Street, along with 99% of the world, still has trouble keeping up on AI

The easiest way to stay ahead in just 5-min per day (and get news like DeepSeek live): The Rundown AI



6/21
@ObiUzi
AAAAA



7/21
@Martinoleary


[Quoted tweet]
Live footage of Sam walking to work today.


https://video.twimg.com/ext_tw_video/1883914184699604992/pu/vid/avc1/320x172/_YAdbhQ9OkcBneBD.mp4

8/21
@mhdfaran
DeepSeek coming in hot with Janus-Pro-7B like, "Beat this, OpenAI!"



9/21
@HighlyRetired
🔥🔥💪💪



10/21
@ObiUzi
I don’t feel good doc 😭



GiUGOOaWYAAerIP.jpg


11/21
@BeginnersinAI
This is great for competition. These last two models are going to push the established players to up their game.



12/21
@AIRoboticsInt
Seems like @elonmusk has a point

[Quoted tweet]
Elon Musk on DeepSeek:

He says, DeepSeek “obviously” has ~50,000 Nvidia H100 chips that they can’t talk about due to US export controls.

Interesting.


GiTWD3BWYAAo6d7.jpg

GiTWD3BXAAAOppS.jpg


13/21
@WealthArchives
Deepseek dropped another model



https://video.twimg.com/ext_tw_video/1883920660377808896/pu/vid/avc1/720x1280/-3NsWWOTMa5QyJWC.mp4

14/21
@RealStarTrump
Supposedly, if you ask deepseek to identify itself, it calls itself ChatGPT which would indicate illicit training data.

Something for devout autists to confirm or deny.



15/21
@CastelMaker
OpenAI after investing 500B



16/21
@laplacesdust
Its over



17/21
@dula2006
Wait until you see GROK 3! @grok 💪



18/21
@czverse
Janus-Pro-7B is turning up the heat! Multimodal dominance + open-source = game changer. DeepSeek’s 🐋 isn’t just cookin’, it’s serving a feast



19/21
@space_ace84
Can it generate this image?



GiULkCPWMAAsYDP.jpg


20/21
@0xAdin




GiUL_LMXIAEsibq.jpg


21/21
@SecretNetwork
For those worried about privacy conerns and data harvesting.

Integrate confidential computing and get the benefits without the concerns.

[Quoted tweet]
x.com/i/article/188246991007…



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top