bnew

Veteran
Joined
Nov 1, 2015
Messages
58,880
Reputation
8,682
Daps
163,139





















1/42
@reach_vb
"DeepSeek-R1-Distill-Qwen-1.5B outperforms GPT-4o and Claude-3.5-Sonnet on math benchmarks with 28.9% on AIME and 83.9% on MATH."

1.5B did WHAT?



GhvJ-_XWYAAHmxB.png


2/42
@reach_vb
repo:

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B · Hugging Face



3/42
@thegenioo
but is it better at coding compared to sonnet new?



4/42
@reach_vb
No, R1 series models aren’t better than V3 on coding, that’s what DeepSeek would work at the next step

They are still pretty powerful tho



5/42
@iatharvkulkarni
How do you run these models? I see that you post a lot of hugging face models, how do you personally run them? Any tool that would help me get going as quickly as possible locally?



6/42
@reach_vb
llama.cpp or transformers, let me make a quick notebook actually



7/42
@leo_grundstrom
How fast is it?



8/42
@reach_vb
Reasonably fast on an old T4, try it out yourself:

[Quoted tweet]
Try out R1 Distill Qwen 1.5B in a FREE Google Colab! 🔥

The vibes are looking gooood!


https://video.twimg.com/ext_tw_video/1881377422135676928/pu/vid/avc1/1670x1080/wUafm6PII4xXizvR.mp4

9/42
@antonio_spie
Can SOMEONE PLEASE TELL ME HOW MANY GBS TO INSTALL LOCALLY??



10/42
@reach_vb
You can try the 1.5B directly here on a free Google colab

[Quoted tweet]
Try out R1 Distill Qwen 1.5B in a FREE Google Colab! 🔥

The vibes are looking gooood!


https://video.twimg.com/ext_tw_video/1881377422135676928/pu/vid/avc1/1670x1080/wUafm6PII4xXizvR.mp4

11/42
@ArpinGarre66002
Dubious



12/42
@reach_vb
Ha, the vibes are strong, I don’t care much about the benchmark, but for a 1.5B it’s pretty strong, try it out yourself:

[Quoted tweet]
Try out R1 Distill Qwen 1.5B in a FREE Google Colab! 🔥

The vibes are looking gooood!


https://video.twimg.com/ext_tw_video/1881377422135676928/pu/vid/avc1/1670x1080/wUafm6PII4xXizvR.mp4

13/42
@gordic_aleksa
overfit is the word



14/42
@reach_vb
Vibe checks look pretty good tho, been playing with it on Colab - not sonnet or 4o like but deffo pretty strong for a 1.5B

[Quoted tweet]
Try out R1 Distill Qwen 1.5B in a FREE Google Colab! 🔥

The vibes are looking gooood!


https://video.twimg.com/ext_tw_video/1881377422135676928/pu/vid/avc1/1670x1080/wUafm6PII4xXizvR.mp4

15/42
@CEOofFuggy
I doubt it's good in general tho, but I'll definitely have to try 14B version etc.



16/42
@reach_vb
think 32B or 70B would be golden



17/42
@snowclipsed
@vikhyatk what if you used this as the text model



18/42
@reach_vb
that would be fire - but I doubt you'd get as much benefit from this, model is a yapper



19/42
@victor_explore


[Quoted tweet]
DeepSeek today 😎


https://video.twimg.com/ext_tw_video/1881340096995364864/pu/vid/avc1/720x720/yN45olzqLvBZt_f_.mp4

20/42
@reach_vb
hahaha, how do you create these, it's amazing!



21/42
@AILeaksAndNews
We are accelerating quickly

3.5 sonnet that can run locally



22/42
@reach_vb
maybe the 32B/ 70B is at that level, I doubt 1.5B will be at that level haha



23/42
@edwardcfrazer
How fast is it 👀



24/42
@reach_vb
On their API it’s pretty fast! 🏎️



25/42
@anushkmittal
1.5b? more like 1.5 based



26/42
@reach_vb
hahahaha!



27/42
@Yuchenj_UW
Unbelievable.

We will have super smart 1B models in the future, running locally on our phone.



28/42
@reach_vb
I call it Baby AGI 😂

[Quoted tweet]
Try out R1 Distill Qwen 1.5B in a FREE Google Colab! 🔥

The vibes are looking gooood!


https://video.twimg.com/ext_tw_video/1881377422135676928/pu/vid/avc1/1670x1080/wUafm6PII4xXizvR.mp4

29/42
@dhruv2038
Yeah this is sus.



30/42
@reach_vb
gotta vibe check it ofc



31/42
@ftmoose
lol



GhwOr3BW4AAes5B.png


32/42
@reach_vb
I love the way it thinks 😂



33/42
@_ggLAB
yet cannot answer historical events.



GhveofdaQAESQqp.png


34/42
@reach_vb
literally doesn't matter, as long as it works for your own use-cases.



35/42
@AntDX316
👍



36/42
@nooriefyi
this is huge. parameter count is so last year.



37/42
@baileygspell
"few shot prompts degrade peformance" def overfit



38/42
@seo_leaders
Going to have to give that model a run up for sure



39/42
@priontific
I'm so psyched holy moly - I'm also super pumped, I think we can squeeze so much by speculative drafting w/ these tiny models. Results in a few hours once I've finished downloading -- stay tuned!!



40/42
@ionet
at first it seemed like humans being outperformed like that by AI, now AI doing the same to other AI 😂😂



41/42
@joysectomy
Western cultures are bad at math, Eastern cultures teach math in a bottoms up way. Wonder how much of a factor that is in the consistent perf gap between these models



42/42
@ChaithanyaK42
This is awesome 👌




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,880
Reputation
8,682
Daps
163,139



Someone bought the domain ‘OGOpenAI’ and redirected it to a Chinese AI lab​


Maxwell Zeff

8:56 PM PST · January 22, 2025



A software engineer has bought the website “OGOpenAI.com” and redirected it to DeepSeek, a Chinese AI lab that’s been making waves in the open source AI world lately.

Software engineer Ananay Arora tells TechCrunch that he bought the domain name for “less than a Chipotle meal,” and that he plans to sell it for more.

The move was an apparent nod to how DeepSeek releases cutting-edge open AI models, just as OpenAI did in its early years. DeepSeek’s models can be used offline and for free by any developer with the necessary hardware, similar to older OpenAI models like Point-E and Jukebox.

DeepSeek caught the attention of AI enthusiasts last week when it released an open version of its DeepSeek-R1 model, which the company claims performs better than OpenAI’s o1 on certain benchmarks. Outside of models such as Whisper, OpenAI rarely releases its flagship AI in an “open” format these days, drawing criticism from some in the AI industry. In fact, OpenAI’s reticence to release its most powerful models is cited in a lawsuit from Elon Musk, who claims that the startup isn’t staying true to its original nonprofit mission.

just found the actual openai https://t.co/wEF0kRNfLA

— Ananay (@ananayarora)
January 22, 2025

Arora says he was inspired by a now-deleted post on X from Perplexity’s CEO, Aravind Srinivas, comparing DeepSeek to OpenAI in its more “open” days. “I thought, hey, it would be cool to have [the] domain go to DeepSeek for fun,” Arora told TechCrunch via DM.

DeepSeek joins Alibaba’s Qwen in the list of Chinese AI labs releasing open alternatives to OpenAI’s models.

The American government has tried to curb China’s AI labs for years with chip export restrictions, but it may need to do more if the latest AI models coming out of the country are any indication.

Topics

AIdeepseekdomaindomain nameogopenai.comopen source ai
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,880
Reputation
8,682
Daps
163,139
Hugging Face claims its new AI models are the smallest of their kind


A team at AI dev platform Hugging Face has released what they’re claiming are the smallest AI models that can analyze images, short videos, and text.

The models, SmolVLM-256M and SmolVLM-500M, are designed to work well on “constrained devices” like laptops with under around 1GB of RAM. The team says that they’re also ideal for developers trying to process large amounts of data very cheaply.

SmolVLM-256M and SmolVLM-500M are just 256 million parameters and 500 million parameters in size, respectively. (Parameters roughly correspond to a model’s problem-solving abilities, such as its performance on math tests.) Both models can perform tasks like describing images or video clips and answering questions about PDFs and the elements within them, including scanned text and charts.

To train SmolVLM-256M and SmolVLM-500M, the Hugging Face team used The Cauldron, a collection of 50 “high-quality” image and text datasets, and Docmatix, a set of file scans paired with detailed captions. Both were created by Hugging Face’s M4 team , which develops multimodal AI technologies.

smoller_vlm_benchmarks.png

Benchmarks comparing the new SmolVLM models to other multimodal models. Image Credits: SmolVLM The team claims that both SmolVLM-256M and SmolVLM-500M outperform a much larger model, Idefics 80B, on benchmarks including AI2D, which tests the ability of models to analyze grade-school-level science diagrams. SmolVLM-256M and SmolVLM-500M are available on the web as well as for download from Hugging Face under an Apache 2.0 license, meaning they can be used without restrictions.

Small models like SmolVLM-256M and SmolVLM-500M may be inexpensive and versatile, but they can also contain flaws that aren’t as pronounced in larger models. A recent study from Google DeepMind, Microsoft Research, and the Mila research institute in Quebec found that many small models perform worse than expected on complex reasoning tasks. The researchers speculated that this could be because smaller models recognize surface-level patterns in data, but struggle to apply that knowledge in new contexts. Topics

AI , AI , Hugging Face , models , multimodal models , open models , open source ai , SmolVLM , vlms
Kyle-Wiggers.jpg
Kyle Wiggers Senior Reporter, Enterprise Kyle Wiggers is a senior reporter at TechCrunch with a special interest in artificial intelligence. His writing has appeared in VentureBeat and Digital Trends, as well as a range of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives in Brooklyn with his partner, a piano educator, and dabbles in piano himself. occasionally — if mostly unsuccessfully. View Bio

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,880
Reputation
8,682
Daps
163,139
Perplexity launches Sonar, an API for AI search


Perplexity launches Sonar, an API for AI search​



Perplexity on Tuesday launched an API service called Sonar, allowing enterprises and developers to build the startup’s generative AI search tools into their own applications.

“While most generative AI features today have answers informed only by training data, this limits their capabilities,” Perplexity wrote in a blog post . “To optimize for factuality and authority, APIs require a real-time connection to the Internet, with answers informed by trusted sources.”

To start, Perplexity is offering two tiers that developers can choose from: a base version that’s cheaper and faster, Sonar, and a pricier version that’s better for tough questions, Sonar Pro. Perplexity says the Sonar API also gives enterprises and developers the ability to customize the sources its AI search engine pulls from.

Introducing Sonar: Perplexity’s API. Sonar is the most affordable search API product on the market. Use it to build generative search, powered by real-time information and citations, into your apps. We’re also offering a Pro version with deeper functionality. pic.twitter.com/CWpVUUKYtW

— Perplexity (@perplexity_ai)
January 21, 2025

With the launch of its API, Perplexity is making its AI search engine available in more places than just its app and website. Perplexity says that Zoom, among other companies, is already using Sonar to power an AI assistant for its video conferencing platform. Sonar is allowing Zoom’s AI chatbot to give real-time answers, informed by web searches with citations, without requiring users to leave the video chat window.

Sonar could also give Perplexity another source of revenue, which could be particularly important to the startup’s investors. Perplexity only offers a subscription service for unlimited access to its AI search engine and some additional features. However, the tech industry has slashed prices to access AI tools via APIs in the last year, and Perplexity claims to be offering the cheapest AI search API on the market via Sonar.

The base version of Sonar offers a cheaper and quicker version of the company’s AI search tools. Sonar’s base version has flat pricing and uses a lightweight model. It costs $5 for every 1,000 searches, plus $1 for every 750,000 words you type into the AI model (roughly 1 million input tokens), and another $1 for every 750,000 words the model spits out (roughly 1 million output tokens).

The pricier Sonar Pro gives more-detailed answers and is capable of handling more-complex questions. This version will run multiple searches on top of a user prompt, meaning the pricing could be more unpredictable. Perplexity also says this version offers twice as many citations as the base version of Sonar. Sonar Pro costs $5 for every 1,000 searches, plus $3 for every 750,000 words you type into the AI model (roughly 1 million input tokens), and $15 for every 750,000 words the model spits out (roughly 1 million output tokens).

Perplexity claims Sonar Pro outperformed leading models from Google, OpenAI, and Anthropic on a benchmark that measures factual correctness in AI chatbot answers, SimpleQA.

In December, Perplexity raised a $500 million funding round led by Institutional Venture Partners, valuing the company at $9 billion.

Correction: A previous version of this story included outdated figures on Perplexity’s recent funding and annual recurring revenue. Topics

AI , AI search , Perplexity

Maxwell-Zeff.jpg
Maxwell Zeff Senior Reporter, Consumer Maxwell Zeff is a senior reporter at TechCrunch specializing in AI and emerging technologies. Previously with Gizmodo, Bloomberg, and MSNBC, Zeff has covered the rise of AI and the Silicon Valley Bank crisis. He is based in San Francisco. When not reporting, he can be found hiking, biking, and exploring the Bay Area’s food scene. View Bio
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,880
Reputation
8,682
Daps
163,139
Meta’s Yann LeCun predicts ‘new paradigm of AI architectures’ within 5 years and ‘decade of robotics’

META's Chief Artificial Intelligence (AI) scientist Yann LeCun addresses a speech as he attends the World Economic Forum (WEF) annual meeting in Davos on January 23, 2025.
Image Credits:FABRICE COFFRINI / AFP / Getty Images

AI







Meta’s Yann LeCun predicts ‘new paradigm of AI architectures’ within 5 years and ‘decade of robotics’​


Paul Sawers

7:28 AM PST · January 23, 2025



Meta’s chief AI scientist, Yann LeCun , says that a “new paradigm of AI architectures” will emerge in the next three to five years, going far beyond the capabilities of existing AI systems.

LeCun also predicted that the coming years could be the “decade of robotics,” where advances in AI and robotics combine to unlock a new class of intelligent applications.

Speaking in a session dubbed “ Debating Technology ” at Davos on Thursday, LeCun said that the “flavor of AI” that we have at the moment — that is, generative AI and large language models (LLMs) — isn’t really up to all that much. It’s useful, sure, but falls short on many fronts.

“I think the shelf life of the current [LLM] paradigm is fairly short, probably three to five years,” LeCun said. “I think within five years, nobody in their right mind would use them anymore, at least not as the central component of an AI system. I think [….] we’re going to see the emergence of a new paradigm for AI architectures, which may not have the limitations of current AI systems.”

These “limitations” inhibit truly intelligent behavior in machines, LeCun says. This is down to four key reasons: a lack of understanding of the physical world; a lack of persistent memory; a lack of reasoning; and a lack of complex planning capabilities.

“LLMs really are not capable of any of this,” LeCun said. “So there’s going to be another revolution of AI over the next few years. We may have to change the name of it, because it’s probably not going to be generative in the sense that we understand it today.”



“World models”​


This echoes sentiments that LeCun has espoused in the past . At the heart of this is what are coming to be known as “ world models ” that promise to help machines understand the dynamics of the real world. This includes having a memory, common sense, intuition, reasoning capabilities — traits far beyond that of current systems, which are mostly about pattern recognition.

Previously, LeCun has said this could still be some 10 years away , but today’s estimate brings things closer on the horizon. Though to what extent it will get to in that time frame isn’t exactly clear.

“LLMs are good at manipulating language, but not at thinking,” LeCun said. “So that’s what we’re working on — having systems build mental models of the world. If the plan that we’re working on succeeds, with the timetable that we hope, within three to five years we’ll have systems that are a completely different paradigm. They may have some level of common sense. They may be able to learn how the world works from observing the world and maybe interacting with it.”



“The decade of robotics”​


As impressive as generative AI is , capable of passing the bar exam or unearthing new drugs , LeCun reckons that robotics could be a central component of the next wave of AI applications in such real-world scenarios.

Meta itself is doing some research work in the robotics realm , but so is the AI darling of the moment, ChatGPT-creator OpenAI. Earlier this month, new job listings emerged detailing a new OpenAI robotics team focused on “general-purpose,” “adaptive,” and “versatile” robots capable of human-like intelligence in real-world settings.

“We don’t have robots that can do what a cat can do — understanding the physical world of a cat is way superior to everything we can do with AI,” he said. “Maybe the coming decade will be the decade of robotics, maybe we’ll have AI systems that are sufficiently smart to understand how the real world works.”

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday. Topics

AI , davos , Facebook , Generative AI , Meta , Robotics , Yann LeCun

PS_f47025.jpg
Paul Sawers Senior Reporter Paul is a senior writer based in London, focused largely (but not exclusively) on the world of UK and European startups. He also writes about other subjects that he’s passionate about, such as the business of open source software.

Prior to joining TechCrunch in June 2022, Paul had gained more than a decade’s experience covering consumer and enterprise technologies for The Next Web (now owned by the Financial Times) and VentureBeat.

Pitches on: paul.sawers [at] techcrunch.com

Secure/anon tip-offs via Signal: PSTC.08

See me on Bluesky: @jambo.bsky.social View Bio



 
Top