Large Language Models News & Discussions

bnew · Nov 17, 2024

1/3
@HaHoang411

Nexa AI team created OmniVision-968M, the world's smallest vision-language model that packs quite a punch for its size.

Key highlights:
- Built on solid foundations: Qwen2.5-0.5B + SigLIP-400M
- Practical edge deployment: Runs on ~1GB RAM

/search?q=#SmolLM /search?q=#Multimodal

https://video.twimg.com/ext_tw_video/1857355906670800896/pu/vid/avc1/1280x720/1ZEF3mCgMRo6v8BU.mp4

2/3
@HaHoang411
Official annoucement: OmniVision-968M: World's Smallest Vision Language Model

3/3
@JulienBlanchon
Pretty interesting

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 17, 2024

1/8
@rohanpaul_ai
Open-source alternative to OpenAI's voice.

Extends text-based LLM to have native "listening" ability. Think of it as an open data, open weight, on device Siri.

What it offers:

Implements native "listening" abilities in LLMs, similar to Siri but runs locally on-device

Latest v0.4 achieves 64.63 MMLU score with enhanced context handling and noise management

Provides complete training pipeline combining speech noise and multi-turn conversation data

Includes ready-to-use demos via WebUI, Gradio interface, and Colab notebooks

2/8
@rohanpaul_ai

GitHub - homebrewltd/ichigo: Local realtime voice AI

3/8
@antona23
Thanks for sharing

4/8
@rohanpaul_ai

5/8
@antonpictures
Does it run on an M1 mac with 8gb ? lol I need better hardware. This mixed with llama 11b vision and some RAG, that's a robot

6/8
@gpt_biz
This sounds fantastic! It's great to see an open-source voice assistant that can run locally—definitely worth a try for those who value privacy and customization.

7/8
@techfusionjb
Native listening ability on-device? That's huge!

8/8
@sky20086
save thread

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 17, 2024

1/4
@HaHoang411
Awesome work by the @deepseek_ai on JanusFlow. They've managed to elegantly combine image understanding and generation in a single model using rectified flow with autoregressive language models.
/search?q=#AI /search?q=#OpenSource /search?q=#Any2Any

https://video.twimg.com/ext_tw_video/1856952968982974464/pu/vid/avc1/1082x720/dRnQFduGCqCMmQdS.mp4

2/4
@HaHoang411
For OCR task it's pretty good but still has problems.

https://video.twimg.com/ext_tw_video/1856961861570199552/pu/vid/avc1/1082x720/QSkGSO5eJt-aZzyO.mp4

3/4
@HaHoang411
They also have some good examples to generate image from text. But to be honest, the image gen needs a lot improvements.

https://video.twimg.com/ext_tw_video/1856962863597813762/pu/vid/avc1/1082x720/_cTXoWRetE0IbV71.mp4

4/4
@HaHoang411
The model: deepseek-ai/JanusFlow-1.3B · Hugging Face
Try here: JanusFlow 1.3B - a Hugging Face Space by deepseek-ai

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 17, 2024

1/11
@sunomusic
v4 is coming soon

https://video.twimg.com/ext_tw_video/1854955202001907712/pu/vid/avc1/720x1280/JFFDhLm1PGV4YWjU.mp4

2/11
@ThenAMug
I hate how corny most of the Hip-Hop/Rap generations are. Hopefully, this update brings some justice to the genre. @sunomusic

3/11
@sunomusic
we posted a hip hop song made with v4 this morning!

4/11
@Jesseclinton_
People who don’t understand creativity will knock this. True creators don’t just use a prompt; they see AI/Suno as part of a process. Bach with a strings VST and a chord generator—he’d innovate, not dismiss. Creativity evolves with tools, not despite them.

5/11
@sunomusic
said beautifully. thanks!

6/11
@StevenDarlow
@nickfloats wasn’t lying!

Wow, that is a game changer!! (again)

7/11
@sunomusic
thanks steve! excited for you to try it

8/11
@NahPlaya1212
Take everything I own

9/11
@sunomusic

10/11
@cooljellyx
Add in painting, and the game is f*cking over.

11/11
@sunomusic
we have inpainting!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@sunomusic
Burkinabe funk, but make it v4

Thank you to all our alpha testers for your feedback on v4 this past week. With your help, we’re now working hard at adding some finishing touches. Thank you everyone for your excitement and patience as we

https://video.twimg.com/ext_tw_video/1857478323317706752/pu/vid/avc1/720x1280/mOnbevzYhNIAIjDQ.mp4

2/11
@AIandDesign
It's amazing!

3/11
@sunomusic
thanks Marco! really glad to hear that. and appreciate your feedback the last few days!

4/11
@JoeProAI
I did a song in anticipation but felt cliche to post. I do have to say whomever is running this update coming soon a campaign is doing a great job. I'm super stoked for this.

5/11
@sunomusic
thank you

6/11
@SwisherYard
Will it have the option to enhance existing songs?

7/11
@sunomusic
Yes - we will have remastering as an option on your existing songs!

8/11
@lxe
This is absolutely bonkers. I've been addicted to suno.

9/11
@sunomusic
appreciate your support and glad you’ve been able to make your own music with suno!

10/11
@WorldEverett
Can't wait, it sounds great

11/11
@RichSilverX
I want it so bad!!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@sunomusic
v4 is coming soon

https://video.twimg.com/ext_tw_video/1856106054909632512/pu/vid/avc1/720x1280/tEGw361SAgwBxdzT.mp4

2/11
@Mitchnavarro7
I'm so pumped for this!

3/11
@gopro_audio
I got 500 songs to re do

lets go

4/11
@jasonjdxb
Can’t wait for SUNO V4! Can we get an upgrade where we can upload our original songs and SUNO AI creates vocals on the entire song using time stamps/prompts for guidance?

5/11
@Emily_Escapor
I hope soon means next week, and I hope you guys start the initial preparation for V5 and start training before the end of the year on the B200 cluster; we need everyone to make music.
I hope we also significantly improve how to control the instruments and advanced control over vocals.

6/11
@mckaywrigley
Can’t come soon enough

7/11
@blizaine
Wow. Tomorrow then?

8/11
@ibrvndy
How soon we talkin?

9/11
@Faedriel
hypeee

10/11
@david_vipernz
It's already best value for anything ever. And it keeps getting better!

11/11
@Liinad_De_Varge
Already forgot you existed

But going to try out the new version for sure!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@nickfloats
something comin' @sunomusic

https://video.twimg.com/ext_tw_video/1854964192417038336/pu/vid/avc1/720x1278/9RQop-MC3yfc638K.mp4

2/11
@dustinhollywood

3/11
@DMctendo
Put it on hi fi speakers in a club and see why it’s not gonna be doing much of anything… yet

4/11
@dkardonsky_
bullish

5/11
@jaredeasley
Fantastic

6/11
@gpt_biz
Excited to see what’s coming next from @sunomusic looking forward to it

7/11
@edh_wow
Honestly, I love Suno. Has the rapping gotten better?

8/11
@ikamanu
Love it. Can you share the prompt for this one?

9/11
@NoodleNakamoto
I'm NGL, this is an absolute banger. Please @nickfloats - if you can extend this, I'd love to hear more.

10/11
@opensaysmani
V4 on its way - can’t wait

11/11
@be_high
API? Word level timestamps ?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@sunomusic
Make a song from any image or video with Suno Scenes.

: Disco Sculptures by ClaraCo

https://video.twimg.com/ext_tw_video/1852821409162207232/pu/vid/avc1/720x1402/WY2XWQmD_XMSALDE.mp4

2/11
@frownsOfficial

https://video.twimg.com/ext_tw_video/1852822140372062210/pu/vid/avc1/720x720/2ysRvDKZDqQ1MC4A.mp4

3/11
@sunomusic

4/11
@UriGil3
Youre too focused on mainstream music. Wish there was more creativity

5/11
@sunomusic
happy to take style suggestions

6/11
@luckycreative_o
And for Android users?

7/11
@sunomusic
Good news - Suno on Android is now open for pre-registration at Suno - AI Music - Apps on Google Play! Have you signed up?

8/11
@BromfieldDuane
when is this coming for android?

9/11
@sunomusic
Good news - Suno on Android is now open for pre-registration at Suno - AI Music - Apps on Google Play! Have you signed up?

10/11
@Mitchnavarro7
I have an android

11/11
@jessyseonoob

https://video.twimg.com/ext_tw_video/1852943354407100417/pu/vid/avc1/720x948/0kx3YGDa890zieDc.mp4

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 17, 2024

1/1
@HaHoang411

Mind-blowing work by the team at @FLAIR_Ox! They've created Kinetix, a framework for training general-purpose RL agents that can tackle physics-based challenges.
The coolest part? Their agents can solve physical reasoning complex tasks zero-shot!

Congrats @mitrma and team.

[Quoted tweet]
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!

We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/

https://video.twimg.com/ext_tw_video/1856003600159256576/pu/vid/avc1/1280x720/zJNdBD1Yq0uFl9Nf.mp4

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/12
@mitrma
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!

We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/

https://video.twimg.com/ext_tw_video/1856003600159256576/pu/vid/avc1/1280x720/zJNdBD1Yq0uFl9Nf.mp4

2/12
@mitrma

Kinetix can represent problems ranging from robotic locomotion and grasping, to classic RL environments and video games, all within a unified framework. This opens the door to training a single generalist agent for all these tasks!
2/

https://video.twimg.com/ext_tw_video/1856003839851220992/pu/vid/avc1/640x640/J_w1M8wm8ibiGCAn.mp4

3/12
@mitrma

By procedurally generating random environments, we train an RL agent that can zero-shot solve unseen handmade problems. This includes some where RL from scratch fails!
3/

https://video.twimg.com/ext_tw_video/1856003979878051840/pu/vid/avc1/720x720/JAcE26Hprn1NXPvU.mp4

4/12
@mitrma

Each environment has the same goal: make

touch

while preventing

touching

. The agent controls all motors and thrusters.

In this task the car has to first be flipped with thrusters. The general agent solves it zero-shot, having never seen it before.
4/

https://video.twimg.com/ext_tw_video/1856004286943002624/pu/vid/avc1/720x720/hjhITONkJiDY9tD2.mp4

5/12
@mitrma

Our general agent shows emergent physical reasoning capabilities, for instance being able to zero-shot control unseen morphologies by moving them underneath a goal (

).
5/

https://video.twimg.com/ext_tw_video/1856004409559306241/pu/vid/avc1/994x540/AA6c6MHpWRkFt3OJ.mp4

6/12
@mitrma

We also show that finetuning this general model on target tasks is more sample efficient than training from scratch, providing a step towards a foundation model for RL.

In some cases, training from scratch completely fails, while our finetuned general model succeeds

6/

https://video.twimg.com/ext_tw_video/1856004545525972993/pu/vid/avc1/1280x720/jMqgYcCwx-q4tSpm.mp4

7/12
@mitrma

One big takeaway from this work is the importance of autocurricula. In particular, we found significantly improved results by dynamically prioritising levels with high 'learnability'.
7/

8/12
@mitrma

The core of Kinetix is our new 2D rigid body physics engine: Jax2D. This is a minimal rewrite of the classic Box2D engine made by @erin_catto. Jax2D allows us to run thousands of heterogeneous parallel environments on a single GPU (yes, you can vmap over different tasks!)
8/

9/12
@mitrma

Don't take our word for it, try it out for yourself!
Create your own levels in your browser with Kinetix.js and see how different pretrained agents perform: Redirecting...
9/

https://video.twimg.com/ext_tw_video/1856004915501350912/pu/vid/avc1/1422x720/7wj1y_BcHHUnNtwx.mp4

10/12
@mitrma
This work was co-led with @mcbeukman and done at @FLAIR_Ox with @_chris_lu_ and @j_foerst.
Blog: https://kinetix-env.github.io/
GitHub: GitHub - FLAIROx/Kinetix: Reinforcement learning on general 2D physics environments in JAX
arXiv: [2410.23208] Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
end/

11/12
@_k_sridhar
Very cool paper! FYI, we recently pretrained a generalist agent that can generalize to unseen atari/metaworld/mujoco/procgen environments simply via retrieval-augmentation and in-context learning. Our work uses an imitation learning approach. REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context In New Environments.

12/12
@mitrma
This is really cool! Let's meet up and chat at ICLR if we both end up going?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 19, 2024

Cerebras Now The Fastest LLM Inference Processor; Its Not Even Close.

The company tackled inferencing the Llama-3.1 405B foundation model and just crushed it.

www.forbes.com

Cerebras Now The Fastest LLM Inference Processor; Its Not Even Close.

Karl Freund

Contributor

Founder and Principal Analyst, Cambrian-AI Research LLC

Nov 18, 2024,02:00pm EST

The company tackled inferencing the Llama-3.1 405B foundation model and just crushed it. And for the crowds at SC24 this week in Atlanta, the company also announced it is 700 times faster than Frontier, the worlds fastest supercomputer, on a molecular dynamics simulation.

The Cerebras Wafer Scale Engine, effectively the world's largest and fastest chip.

Cerebras Systems, Inc.

“There is no supercomputer on earth, regardless of size, that can achieve this performance,” said Andrew Feldman, Co-Founder and CEO of the AI startup. As a result, scientist can now accomplish in a single day what it took two years of GPU-based supercomputer simulations to achieve.

Cerebras Inferencing Llama-3.1 405B

When Cerebras announced its record-breaking performance on the 70 billion parameter Llama 3.1, it was quite a surprise; Cerebras had previously focussed on using its Wafer Scale Engine (WSE) on the more difficult training part of the AI workflow. The memory on a CS3 is fast on-chip SRAM instead of the larger (and 10x slower) High Bandwidth Memory used in data center GPUs. Consequently, the Cerebras CS3 provides 7,000x more memory bandwidth than the Nvidia H100, addressing Generative AI's fundamental technical challenge: memory bandwidth.

The leap in performance achieved by Cerebras is dramatic.

Cerebras Systems

And the latest result is just stupendous. Look at the charts above for a performance over time, and below to compare the competitive landscape for Llama 3.1-405B. The entire industry occupies the upper left quadrant of the chart, showing output speeds below the 100 tokens-per-second range for the Meta Llama 3.1-405B model. Cerebras produced some 970 tokens per second, all at roughly the same price as GPU and custom ASIC services like Samna N: 6 dollars per million input tokens and $12 dollars per million output tokens.

MORE FOR YOU

New Chrome, Safari, Firefox, Edge Warning—Do Not Shop On These Websites

Don’t Hold Down The Ctrl Key—New Warning As Cyber Attacks Confirmed

Year’s Strongest Meteor Shower Has Begun: When To See The Geminids At Their Best

Cerebras massively outperformed all other systems for Llama-3.1 405B.

Cerebras

Forbes Daily: Join over 1 million Forbes Daily subscribers and get our best stories, exclusive reporting and essential analysis of the day’s news in your inbox every weekday.

Compared to the competition, using 1000 input tokens, Cerebras embarrassed GPUs which all produced less than 100 tokens per second. Only SambaNova even came “close” at 164 tokens. Now, as you know, there is no free lunch; a single CS3 is estimated to cost between $2-3M, though the exact price is not publicly disclosed by the company. But the performance, latency, and throughput amortize that cost over a massive number of users.

But who needs this level of performance?

Ok, so nobody can ready anywhere close to 1000/tokens per second, which translates into about 500 words. But computers certainly can and do. And inference is undergoing a transformation, from simple queries to becoming a component in agentic AI and multi-query AI to provide better results.

“By running the largest models at instant speed, Cerebras enables real-time responses from the world’s leading open frontier model,” noted Mr. Feldman. “This opens up powerful new use cases, including reasoning and multi-agent collaboration, across the AI landscape." Open AI’s o1 may demand as much as 10 times the computer of GPT-40 and agentic AI coupled with chain of thought requires over 100 times the performance available on today’s fastest GPUs.

Chain of thought, as seen with OpenAI's o1 service, and agentic AI are two of the examples requiring ... [+]

Cerebras

Cerebras and Molecular Dynamics Simulation

Since this week is SuperComputing ‘24, Cerebras also announced an amazing scientific accomplishment. The CS3 was able to deliver 1.2 million simulation steps per second, a new world’s record. Thats 700 times faster than Frontier, the worlds fastest supercomputer. This means that scientists can now perform 2 years worth of GPU-based simulations in a single day on a simgle Cerebras System. And this benchmark is based on the older CS-2 WSE!

Cerebras CS-2 blows the Frontier SuperComputer out of the water for molecular dynamics simulation

Cerebras

Conclusions

Instead of scaling AI training to produce more accurate answers, chain-of-thought reasoning explore different avenues and provide better answers. This "think before answering" approach provides dramatically better performance on demanding tasks like math, science, and code generation, fundamentally boosting the intelligence of AI models without requiring additional training. By running over 70x faster than other solutions, Cerebras Inference allows AI models to "think" far longer and return more accurate results. As agentic AI becomes available and eventually widespread, the demands in ifnerencing hardware will increase by another 10-fold.

Nothing even comes close to Cerebras in these emerging advancements in AI.

bnew · Nov 19, 2024

Building a Large Geospatial Model to Achieve Spatial Intelligence

At Niantic, we are pioneering the concept of a Large Geospatial Model that will use large-scale machine learning to understand a scene and connect it to millions of other scenes globally.

nianticlabs.com

November 12, 2024

Building a Large Geospatial Model to Achieve Spatial IntelligenceEric Brachmann and Victor Adrian Prisacariu

At Niantic, we are pioneering the concept of a Large Geospatial Model that will use large-scale machine learning to understand a scene and connect it to millions of other scenes globally.

When you look at a familiar type of structure – whether it’s a church, a statue, or a town square – it’s fairly easy to imagine what it might look like from other angles, even if you haven’t seen it from all sides. As humans, we have “spatial understanding” that means we can fill in these details based on countless similar scenes we’ve encountered before. But for machines, this task is extraordinarily difficult. Even the most advanced AI models today struggle to visualize and infer missing parts of a scene, or to imagine a place from a new angle. This is about to change: Spatial intelligence is the next frontier of AI models.

As part of Niantic’s Visual Positioning System (VPS), we have trained more than 50 million neural networks, with more than 150 trillion parameters, enabling operation in over a million locations. In our vision for a Large Geospatial Model (LGM), each of these local networks would contribute to a global large model, implementing a shared understanding of geographic locations, and comprehending places yet to be fully scanned.

The LGM will enable computers not only to perceive and understand physical spaces, but also to interact with them in new ways, forming a critical component of AR glasses and fields beyond, including robotics, content creation and autonomous systems. As we move from phones to wearable technology linked to the real world, spatial intelligence will become the world’s future operating system.

What is a Large Geospatial Model?

Large Language Models (LLMs) are having an undeniable impact on our everyday lives and across multiple industries. Trained on internet-scale collections of text, LLMs can understand and generate written language in a way that challenges our understanding of “intelligence”.

Large Geospatial Models will help computers perceive, comprehend, and navigate the physical world in a way that will seem equally advanced. Analogous to LLMs, geospatial models are built using vast amounts of raw data: billions of images of the world, all anchored to precise locations on the globe, are distilled into a large model that enables a location-based understanding of space, structures, and physical interactions.

The shift from text-based models to those based on 3D data mirrors the broader trajectory of AI’s growth in recent years: from understanding and generating language, to interpreting and creating static and moving images (2D vision models), and, with current research efforts increasing, towards modeling the 3D appearance of objects (3D vision models).

Geospatial models are a step beyond even 3D vision models in that they capture 3D entities that are rooted in specific geographic locations and have a metric quality to them. Unlike typical 3D generative models, which produce unscaled assets, a Large Geospatial Model is bound to metric space, ensuring precise estimates in scale-metric units. These entities therefore represent next-generation maps, rather than arbitrary 3D assets. While a 3D vision model may be able to create and understand a 3D scene, a geospatial model understands how that scene relates to millions of other scenes, geographically, around the world. A geospatial model implements a form of geospatial intelligence, where the model learns from its previous observations and is able to transfer knowledge to new locations, even if those are observed only partially.

While AR glasses with 3D graphics are still several years away from the mass market, there are opportunities for geospatial models to be integrated with audio-only or 2D display glasses. These models could guide users through the world, answer questions, provide personalized recommendations, help with navigation, and enhance real-world interactions. Large language models could be integrated so understanding and space come together, giving people the opportunity to be more informed and engaged with their surroundings and neighborhoods. Geospatial intelligence, as emerging from a large geospatial model, could also enable generation, completion or manipulation of 3D representations of the world to help build the next generation of AR experiences. Beyond gaming, Large Geospatial Models will have widespread applications, ranging from spatial planning and design, logistics, audience engagement, and remote collaboration.

Our work so far

Over the past five years, Niantic has focused on building our Visual Positioning System (VPS), which uses a single image from a phone to determine its position and orientation using a 3D map built from people scanning interesting locations in our games and Scaniverse.

With VPS, users can position themselves in the world with centimeter-level accuracy. That means they can see digital content placed against the physical environment precisely and realistically. This content is persistent in that it stays in a location after you’ve left, and it’s then shareable with others. For example, we recently started rolling out an experimental feature in Pokémon GO, called Pokémon Playgrounds, where the user can place Pokémon at a specific location, and they will remain there for others to see and interact with.

Niantic’s VPS is built from user scans, taken from different perspectives and at various times of day, at many times during the years, and with positioning information attached, creating a highly detailed understanding of the world. This data is unique because it is taken from a pedestrian perspective and includes places inaccessible to cars.

Today we have 10 million scanned locations around the world, and over 1 million of those are activated and available for use with our VPS service. We receive about 1 million fresh scans each week, each containing hundreds of discrete images.

As part of the VPS, we build classical 3D vision maps using structure from motion techniques - but also a new type of neural map for each place. These neural models, based on our research papers ACE (2023) and ACE Zero (2024) do not represent locations using classical 3D data structures anymore, but encode them implicitly in the learnable parameters of a neural network. These networks can swiftly compress thousands of mapping images into a lean, neural representation. Given a new query image, they offer precise positioning for that location with centimeter-level accuracy.

Niantic has trained more than 50 million neural nets to date, where multiple networks can contribute to a single location. All these networks combined comprise over 150 trillion parameters optimized using machine learning.

From Local Systems to Shared Understanding

Our current neural map is a viable geospatial model, active and usable right now as part of Niantic’s VPS. It is also most certainly “large”. However, our vision of a “Large Geospatial Model” goes beyond the current system of independent local maps.

An entirely local model might lack complete coverage of their respective locations. No matter how much data we have available on a global scale, locally, it will often be sparse. The main failure mode of a local model is its inability to extrapolate beyond what it has already seen and from where the model has seen it. Therefore, local models can only position camera views similar to the views they have been trained with already.

Imagine yourself standing behind a church. Let us assume the closest local model has seen only the front entrance of that church, and thus, it will not be able to tell you where you are. The model has never seen the back of that building. But on a global scale, we have seen a lot of churches, thousands of them, all captured by their respective local models at other places worldwide. No church is the same, but many share common characteristics. An LGM is a way to access that distributed knowledge.

An LGM distills common information in a global large-scale model that enables communication and data sharing across local models. An LGM would be able to internalize the concept of a church, and, furthermore, how these buildings are commonly structured. Even if, for a specific location, we have only mapped the entrance of a church, an LGM would be able to make an intelligent guess about what the back of the building looks like, based on thousands of churches it has seen before. Therefore, the LGM allows for unprecedented robustness in positioning, even from viewpoints and angles that the VPS has never seen.

The global model implements a centralized understanding of the world, entirely derived from geospatial and visual data. The LGM extrapolates locally by interpolating globally.

Human-Like Understanding

The process described above is similar to how humans perceive and imagine the world. As humans, we naturally recognize something we’ve seen before, even from a different angle. For example, it takes us relatively little effort to back-track our way through the winding streets of a European old town. We identify all the right junctions although we had only seen them once and from the opposing direction. This takes a level of understanding of the physical world, and cultural spaces, that is natural to us, but extremely difficult to achieve with classical machine vision technology. It requires knowledge of some basic laws of nature: the world is composed of objects which consist of solid matter and therefore have a front and a back. Appearance changes based on time of day and season. It also requires a considerable amount of cultural knowledge: the shape of many man-made objects follow specific rules of symmetry or other generic types of layouts – often dependent on the geographic region.

While early computer vision research tried to decipher some of these rules in order to hard-code them into hand-crafted systems, it is now consensus that such a high degree of understanding as we aspire to can realistically only be achieved via large-scale machine learning. This is what we aim for with our LGM. We have seen a first glimpse of impressive camera positioning capabilities emerging from our data in our recent research paper MicKey (2024). MicKey is a neural network able to position two camera views relative to each other, even under drastic viewpoint changes.

MicKey can handle even opposing shots that would take a human some effort to figure out. MicKey was trained on a tiny fraction of our data – data that we released to the academic community to encourage this type of research. MicKey is limited to two-view inputs and was trained on comparatively little data, but it still represents a proof of concept regarding the potential of an LGM. Evidently, to accomplish geospatial intelligence as outlined in this text, an immense influx of geospatial data is needed – a kind of data not many organizations have access to. Therefore, Niantic is in a unique position to lead the way in making a Large Geospatial Model a reality, supported by more than a million user-contributed scans of real-world places we receive per week.

Towards Complementary Foundation Models

An LGM will be useful for more than mere positioning. In order to solve positioning well, the LGM has to encode rich geometrical, appearance and cultural information into scene-level features. These features will enable new ways of scene representation, manipulation and creation. Versatile large AI models like the LGM, which are useful for a multitude of downstream applications, are commonly referred to as “foundation models”.

Different types of foundation models will complement each other. LLMs will interact with multimodal models, which will, in turn, communicate with LGMs. These systems, working together, will make sense of the world in ways that no single model can achieve on its own. This interconnection is the future of spatial computing – intelligent systems that perceive, understand, and act upon the physical world.

As we move toward more scalable models, Niantic’s goal remains to lead in the development of a large geospatial model that operates wherever we can deliver novel, fun, enriching experiences to our users. And, as noted, beyond gaming Large Geospatial Models will have widespread applications, including spatial planning and design, logistics, audience engagement, and remote collaboration.

The path from LLMs to LGMs is another step in AI’s evolution. As wearable devices like AR glasses become more prevalent, the world’s future operating system will depend on the blending of physical and digital realities to create a system for spatial computing that will put people at the center.

bnew · Nov 20, 2024

1/67
@deepseek_ai

DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!

o1-preview-level performance on AIME & MATH benchmarks.

Transparent thought process in real-time.

Open-source models & API coming soon!

Try it now at DeepSeek
/search?q=#DeepSeek

2/67
@deepseek_ai

Impressive Results of DeepSeek-R1-Lite-Preview Across Benchmarks!

3/67
@deepseek_ai

Inference Scaling Laws of DeepSeek-R1-Lite-Preview
Longer Reasoning, Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases.

4/67
@paul_cal
Tic tac toe will be cracked one day. Not today

5/67
@paul_cal
Very impressive! Esp transparent CoT and imminent open source release

I get it's hard to compare w unreleased o1's test time scaling without an X axis, but worth noting o1 full supposedly pushes higher on AIME (~75%)

What's with the inconsistent blue lines though?

6/67
@Yuchenj_UW
Congrats! looking forward to the open-source models and hosting them on Hyperbolic!

7/67
@itaybachman
thank you for forcing OpenAI to release o1 full

8/67
@koltregaskes
Amazing work. Bettering o1-preview is a huge achievement!

9/67
@GorkaMolero
How is this free?

10/67
@_Holistech_
Deep Think is awesome, congratulations! Reading the inner thoughts removes the impression of a black box like OpenAI o1 and is fun to read. I am very impressed.

11/67
@shytttted

12/67
@MarkusOdenthal
Nice I looking forward to the API.

13/67
@MaximeRivest
The problem with all those cool new models when you're a builder is that every new model coming out needs a thorough review and evaluation.

14/67
@XENOWHITEx
Yeah boi we accelerating. Lfg

15/67
@Presidentlin
Impressive very nice, just over 2 months. Still would like VL2

https://pbs.twimg.com/media/Gc1NHJ0XEAANOlz.jpg

16/67
@eoft_ai
Sheeeeesh, great work! The gap closes day by day

17/67
@AtaeiMe
Open source soon that later pls! Is the white paper coming as well?

18/67
@stochasticchasm
Very great work!

19/67
@JasonBotterill3
the chinese are here

20/67
@sauravpanda24
Open source o1 alternative would be a great model for fine tuning LLMs for other tasks. Super excited to try them out!

21/67
@DustinBeachy
Always great to have another brain to work with. Great job!!

22/67
@metsikpalk
It got this right with deep think mode enabled. But it took 65 seconds and wrote me an entire book. But okay fair, it’s trying super hard to find the answer. Perhaps increase the models own answering confidence?

23/67
@victor_explore
when everyone's releasing "lite" models, you know ai's gone full tech fashion season

24/67
@Emily_Escapor
I hope these numbers are accurate, not just for hype.

25/67
@mrsiipa
is this due to the nature of training data or something else? this model is not able to answer these problems correctly

[Quoted tweet]
i tried this bash coding example from @OpenAI 's o1 blogpost with @deepseek_ai 's model, it thinks for 12 seconds but the final code does not work.

The output should be:
[1,3,5],[2,4,6]

26/67
@SystemSculpt
The whale surfaces again for a spectacular show.

27/67
@johnkhfung
Been quiet for a while and cooking something great!
It is really good. I am waiting the API access

28/67
@NotBrain4brain
They did this to undercut OpenAI full o1 releases today

29/67
@IncKingdomX
This is impressive!

30/67
@bbbbbwu
man it's so cool

31/67
@mintisan
very nice work,bro...

32/67
@NaanLeCun
Chinese AI labs>US AI labs

33/67
@etemiz
Mr. altman, tear, down, this wall!

34/67
@BobTB12
It fails this test. O1-preview knows strawberry is still on the bed, as it falls out.

35/67
@WilliamLamkin

36/67
@leo_agi
will you release a tech report？

37/67
@RobIsTheName2
It is "Lite" but competes with o1-preview? Curious to see non "Lite" r1

38/67
@vlad3ciobanu
I'm blown away. The visible chain of thought is a major breakthrough for open AI research. Congratulations!

39/67
@AnAcctOfAllTime
LLMs when they need to make a countdown from 100 to 1 and replace all multiples of 7 with a specific word:

40/67
@AI_GPT42
Can I get API access to DeepSeek-R1-Lite-Preview?

41/67
@allformodel
this is insane. when API?

42/67
@HrishbhDalal
will you open source it as well? would be amazing!!

43/67
@jimmycarter173
Looking forward to the paper! Yours are always a great read!

Will this be a closed-source model?

44/67
@judd2324
Awesome! Hope to see a open weight release.

45/67
@DavidAIinchina
It was impressive to see the results.

46/67
@99_frederick
A wave of distilled long-form reasoning data applications is coming to specific domains soon. /search?q=#NLP /search?q=#LLM

47/67
@lehai0609
You are GOAT. Take my money!!!

48/67
@zeroth_e
I tried it and it still seems to be worse than o1-preview at coding in certain tasks, but I think its math capabilities are better. It's around even, and I really want OpenAI to release o1-full now

49/67
@Cesarfalante
Thats incredible!
50 messages a day is honestly more than enough for the average person if you only use the reasoning model for reasoning tasks (and traditional LLMs for other stuff).
Great job, cheers from Brazil!

50/67
@Ashton_Waystone
newbee

51/67
@khaldounbouzai1
most important thing the dataset code and the paper transparency for all open source community to cooperate better

52/67
@btc4me2
thank you for showing the full reasoning traces! @OpenAI your move

53/67
@jermiah_64
Absolutely wild! Props to you guys

54/67
@Cyclops02448225
Every time I am about to take gpt/claude subscription Deepseek or Qwen drops a banger.

55/67
@PatrikarSanket
If a mobile app isn't in the plans, could you consider a pwa? Would love to have that on the phone to use.

56/67
@RThermo56
There is no moat.

Nvidia is the only winner.

57/67
@_CorvenDallas_
I just tried it and I'm impressed

58/67
@abtb168
congrats on the release!

59/67
@marvijo99
Link to the paper please

60/67
@zaeppang316902

@OpenAI

61/67
@JimMcM4
Thank you! I'll test this out. One tip though, I was able to bypass agreeing to your ToS and Privacy Policy when signing up through Google Auth. DM me if you need details.

62/67
@revmarsonal
Great work!

63/67
@bowtiedjconnor
@CerebrasSystems, run this model and the world is yours.

64/67
@0xEbadi
Deep think enable vs disable

65/67
@vasej79628
wow you are fast

https://video.twimg.com/ext_tw_video/1859240500533833730/pu/vid/avc1/720x720/LARBE1WPCfev9ctA.mp4

66/67
@noxlonga
smart enough to realize that -5 can be the answer, but hallucinated about positive integer as requirement

67/67
@modelsarereal
you should avoid anti-consciousness training

[Quoted tweet]
here is the answer of the new chinese model "DeepThink" of deepseek

Seems to be a trained anti-consciousness answer to avoid the AI to appear having any kind of conscious behavior.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 20, 2024

1/1
@mrsiipa
i tried this bash coding example from @OpenAI 's o1 blogpost with @deepseek_ai 's model, it thinks for 12 seconds but the final code does not work.

The output should be:
[1,3,5],[2,4,6]

[Quoted tweet]
i tried this cipher example from @OpenAI 's o1 blogpost on @deepseek_ai 's new model but it was not able to figure it out after thinking for ~2 minutes.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@AndrewCurran_
Two months after the o1-preview announcement, and its Chain-of-Thought reasoning has been replicated. The Whale can now reason. DeepSeek says that the official version of DeepSeek-R1 will be completely open source.

[Quoted tweet]

DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!

o1-preview-level performance on AIME & MATH benchmarks.

Transparent thought process in real-time.

Open-source models & API coming soon!

Try it now at chat.deepseek.com
#DeepSeek

2/11
@AndrewCurran_
This is interesting, and promising.

'DeepSeek-R1-Lite also uses a smaller base model, which cannot fully unleash the potential of the long thinking chain.'

I wonder if there is a similar size difference in the OpenAI preview and full release versions.

3/11
@AndrewCurran_

[Quoted tweet]
its so cute

4/11
@AndrewCurran_
Unfiltered chain of thought.

[Quoted tweet]
Thats looking really promising, unfiltered CoT check this:
"But I'm a bit confused because 9.11 has more decimal places. Does that make it larger? I think not necessarily. The number of decimal places doesn't determine the size; it's the value of each digit that matters."

5/11
@AndrewCurran_
Good stuff, it's great watching the model think things out.

[Quoted tweet]
was biting nails on the edge of my seat here, fr. 10/10 would prompt again. Defeat, turnaround, final fight – and glorious victory.
DeepSeek-r1-lite revolutionizes LLM inference by turning it into a dramatic show with open reasoning chains. No SORA needed. Take notes OpenAI.

6/11
@dikksonPau
Where does it say the full version will be completely open source?

7/11
@AndrewCurran_
Chinese WeChat.

8/11
@thedealdirector
The more important question is does this increase the performance in agent workflows that embeds this.

9/11
@ivanfioravanti
It’s incredible how OpenAI leads the first release of an innovation and everyone else replicate it in few months. I’m wondering if there is a real unique elements to an AI company or they’ll be all the same in the long run. Becoming utilities like energy, gas, etc

10/11
@soundsbyjo
When will we have web search via API? GPT doesn’t even have it

11/11
@DirkBruere
"Reasoning" is meaningless without specifying depth of reasoning

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

DeepSeek

Chat with DeepSeek AI.

chat.deepseek.com

1/1
@DataDeLaurier
The ball is in everyone's court now. @deepseek_ai thought deeply on the ways to show closed source companies the truth...

...Closed source models are a net negative to humanity.

[Quoted tweet]
Soon to be open R1 model from @deepseek_ai casually "thought" for 100+ seconds and over 7500 coherent generated tokens!

https://video.twimg.com/ext_tw_video/1859204165181878272/pu/vid/avc1/720x780/9yqIWAry4H0Zz39D.mp4

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/8
@_philschmid
Soon to be open R1 model from @deepseek_ai casually "thought" for 100+ seconds and over 7500 coherent generated tokens!

https://video.twimg.com/ext_tw_video/1859204165181878272/pu/vid/avc1/720x780/9yqIWAry4H0Zz39D.mp4

2/8
@Zapidroid
Best of all. We can see the thought process

3/8
@TheJohnEgan
damn

4/8
@JulienSLauret
Deepseek has been consistently impressive, but beating o1... damn

5/8
@masfiq018
Mine thought for 221 seconds

6/8
@jermiah_64
We are for a crazy ride in 2025....deepseek is known to make powerful models very fast... Expect improvements soon

7/8
@CohorteAI
100+ seconds of coherent thought is a glimpse into long-form reasoning. Do you see this capability extending to applications like autonomous agents working on long-term projects or simulations?

8/8
@HrishbhDalal
are you sure about open? did they confirm it?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/21
@_philschmid
Mindblowing!

New reasoning model preview from @deepseek_ai that matches @OpenAI o1!

DeepSeek-R1-Lite-Preview is now live to test in deepseek chat designed for long Reasoning!

> o1-preview-level performance on AIME & MATH benchmarks.
> Access to CoT and transparent thought process in real-time.
> Open-source models & API coming soon!

My test prompt:
Can you crack the code?
9 2 8 5 (One number is correct but in the wrong position)
1 9 3 7 (Two numbers are correct but in the wrong positions)
5 2 0 1 (one number is correct and in the right position) 6 5 0 7 (nothing is correct)
8 5 24 (two numbers are correct but in the wrong

Correct answer is 3841

2/21
@mandeepabagga
Ok where's the link super excited to get hands on this

3/21
@_philschmid
DeepSeek

4/21
@gordic_aleksa
what do you mean by "matches OpenAI o1"?

5/21
@_philschmid
that

6/21
@BennettBuhner
Bro o1-mini got it 0-shot ._.

7/21
@Gopinath876
When it's coming to huggingface?

8/21
@adridder
The rise of advanced AI models is fascinating and promising. What are your thoughts on how we can harness their potential responsibly?

9/21
@KullAxel
I did the exact same test but never got good answers from DeepSeek!

OpenAI and cloude are hard to beat!

10/21
@deter3
still can not compare with o1 on math .

11/21
@victor_explore
oh snap

12/21
@evil_malloc
did they explain anywhere how they managed to do this ?

13/21
@sivajisahoo
I did try deepseek.

It indeed is mind blowing.

For a problem statement, it explained the thought process, reasoning so well. I felt on par to o1 preview level performance.

14/21
@chenleij81
Chipper Nodes: The vital cog in the DIN network, driving data validation, vectorization, and reward distribution.

@din_lol_ GODIN

15/21
@darin_gordon
I tried deep-thinking mode and cannot confirm the same experience. It did not take instructions well nor did it have a grasp of subject material as well as Sonnet3.6 nor Gemma 1114

16/21
@john_whickins
What's also great with @deepseek_ai is the unmatched speed of their LLM models locally, and these 'lite' versions are simply fantastic.

17/21
@adugbovictory
"Solving complex logic puzzles like this showcases next-level reasoning, but here’s the question: can these models handle real-world ambiguity as effectively as structured logic?

18/21
@stbaasch

19/21
@jermiah_64
Absolutely wild that they will cook such a powerful Model over such a short period of time

20/21
@masfiq018
It didn't get it right.

21/21
@abuchanlife
that sounds wild! ai is leveling up so fast.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Heimdall · Nov 20, 2024

hmm, I think this fits here better than the AI megathread. I read this a while ago and thought it was interesting (perhaps thematically linked to this thread):

Have we stopped to think about what LLMs actually model?

Claims about much-hyped tech show flawed understanding of language and cognition, research argues

www.theregister.com

snips:

Obviously, as a cognitive scientist who has the expertise and understanding of human language, it's disheartening to see a lot of these claims made without proper evidence to back them up. But they also have downstream impacts in various domains. If you start treating these massive complex engineering systems as language understanding machines, it has implications in how policymakers and regulators think about them."

But there are a couple of implicit assumptions in this approach.

"The first is what we call the assumption of language completeness – that there exists a 'thing' called a 'language' that is complete, stable, quantifiable, and available for extraction from traces in the environment," the paper says. "The engineering problem then becomes how that 'thing' can be reproduced artificially. The second assumption is the assumption of data completeness – that all of the essential characteristics can be represented in the datasets that are used to initialize and 'train' the model in question. In other words, all of the essential characteristics of language use are assumed to be present within the relationships between tokens, which presumably would allow LLMs to effectively and comprehensively reproduce the 'thing' that is being modeled."

The problem is that one of the more modern branches of cognitive science sees language as a behavior rather than a big pile of text. In other words, language is something we do, and have done for hundreds of thousands of years.

The approach taken by Birhane and her colleagues is to understand human thought in terms that are "embodied" and "enacted."

"The idea is that cognition doesn't end at the brain and the person doesn't end at the the skin. Rather, cognition is extended. Personhood is messy, ambiguous, intertwined with the existence of others, and so on," she said.

Tone of voice, gesture, eye contact, emotional context, facial expressions, touch, location, and setting are among the factors that influence what is said or written.

Language behavior "cannot, in its entirety, be captured in representations appropriate for automation and computational processing. Written language constitutes only part of human linguistic activity," the paper says.

In other words, the stronger claims of AI builders fall down on the assumption that language itself is ever complete. The researchers argue the second assumption – that language is captured by a corpus of text – is also false by the same means.

It's true that both humans and LLMs learn from examples of text, but by looking at how humans use language in their lives, there's a great deal missing. As well as human language being embodied, it is something in which people participate.

"Training data therefore is not only necessarily incomplete but also lacks to capture the motivational, participatory, and vitally social aspects that ground meaning making by people," the paper says.

...

But claims asserting the usefulness of LLMs as a tool alone have also been exaggerated.

"There is no clear evidence that that shows LLMs are useful because they are extremely unreliable," Birhane said. "Various scholars have been doing domain specific audits … in legal space … and in medical space. The findings across all these domains is that LLMs are not actually that useful because they give you so much unreliable information."

Birhane argues that there are risks in releasing these models into the wild that would be unacceptable in other industries.

And yet, they are already being used widely, and apparently successfully. :jbhmm:

bnew · Nov 20, 2024

Heimdall said:
hmm, I think this fits here better than the AI megathread. I read this a while ago and thought it was interesting (perhaps thematically linked to this thread):

Have we stopped to think about what LLMs actually model?

Claims about much-hyped tech show flawed understanding of language and cognition, research argues

www.theregister.com

snips:

And yet, they are already being used widely, and apparently successfully.

I think there are some posts there about lecun talking about current LLM's and how theres going to need a few more breakthroughs to get to AGI due to how they're modeled.

"There is no clear evidence that that shows LLMs are useful because they are extremely unreliable," Birhane said. "Various scholars have been doing domain specific audits … in legal space … and in medical space. The findings across all these domains is that LLMs are not actually that useful because they give you so much unreliable information."

she's hallucinating. for me personally it's been incredibly useful for brainstorming, finding quick solutions, getting instructions and parsing instructions from a large corpus of text or a collection of text snippets, generating code for projects that you can't afford or might not be worth it to pay someone to do it.

she could visit any place online to see how people are finding it useful if used correctly to reduce hallucinations.

bnew · Nov 20, 2024

Papers with Code - Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency

No code available yet.

paperswithcode.com

Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency

11 Jul 2024 · Abeba Birhane, Marek McGann ·

In this paper we argue that key, often sensational and misleading, claims regarding linguistic capabilities of Large Language Models (LLMs) are based on at least two unfounded assumptions; the assumption of language completeness and the assumption of data completeness. Language completeness assumes that a distinct and complete thing such as `a natural language' exists, the essential characteristics of which can be effectively and comprehensively modelled by an LLM. The assumption of data completeness relies on the belief that a language can be quantified and wholly captured by data. Work within the enactive approach to cognitive science makes clear that, rather than a distinct and complete thing, language is a means or way of acting. Languaging is not the kind of thing that can admit of a complete or comprehensive modelling. From an enactive perspective we identify three key characteristics of enacted language; embodiment, participation, and precariousness, that are absent in LLMs, and likely incompatible in principle with current architectures. We argue that these absences imply that LLMs are not now and cannot in their present form be linguistic agents the way humans are. We illustrate the point in particular through the phenomenon of `algospeak', a recently described pattern of high stakes human language activity in heavily controlled online environments. On the basis of these points, we conclude that sensational and misleading claims about LLM agency and capabilities emerge from a deep misconception of both what human language is and what LLMs are.

https://arxiv.org/pdf/2407.08790v1.pdf

[2407.08790v1] Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency

bnew · Nov 20, 2024

Sung Kim (@sungkim.bsky.social)

DeepSeek-R1-Lite-Preview 🔍 o1-preview-level performance on AIME & MATH benchmarks. 💡 Transparent thought process in real-time. 🛠️ Open-source models & API coming soon! Try it at chat.deepseek.com

bsky.app

1/1

@Sung Kim
DeepSeek-R1-Lite-Preview

o1-preview-level performance on AIME & MATH benchmarks.

Transparent thought process in real-time.

Open-source models & API coming soon!

Try it at chat.deepseek.com

https://chat.deepseek.com"

bafkreiafhjcdlgoczk5rgbppzptzfltik7j4irzeki2w64outicxvmfr4e@jpeg

To post in this format, more info here: https://www.example.com/format-info

Philipp Schmid (@philschmid.bsky.social)

Mindblowing! 🤯 New reasoning model preview from Deepseek that matches OpenAI o1! 🐳 DeepSeek-R1-Lite-Preview is now live to test! 🧠 > o1-preview-level performance on AIME & MATH benchmarks. > Access to CoT and transparent thought process in real-time. > Open-source models & API coming soon!

bsky.app

bafkreidfbuwiytz2culh4iy7org4oizhmrozr4bwm3i6bimx2g4vsqiz5i@jpeg

DeepSeek 推理模型预览版上线，解密 o1 推理过程

推理性能媲美 o1-preview，公开完整思维链

mp-weixin-qq-com.translate.goog

bnew · Nov 20, 2024

1/10
@teortaxesTex
was biting nails on the edge of my seat here, fr. 10/10 would prompt again. Defeat, turnaround, final fight – and glorious victory.
DeepSeek-r1-lite revolutionizes LLM inference by turning it into a dramatic show with open reasoning chains. No SORA needed. Take notes OpenAI.

2/10
@teortaxesTex
And if I recall correctly, either @huybery or @JustinLin610 have said that Qwen team is also working on a reasoner (maybe it was even called r1 too).

So hopefully we'll see competition by the time the Whale delivers their completed masterwork.

3/10
@teortaxesTex
r1-lite's chains are startlingly similar to @_xjdr's ideas on training pivot tokens for Entropix, by the way.
I was skeptical at the time because, well, maybe the look of OpenAI's chains is accidental. Did DeepSeek arrive at the same idea as Shrek?

[Quoted tweet]
This one log is more valuable than 10 papers on MCTS for mathematical reasoning, and completely different from my speculation

meditate on it as long as it takes

4/10
@nanulled
wait, actually, wait

5/10
@Grad62304977
I'm so betting on that it looks like this but obv not the same

[Quoted tweet]
Cont'd:
- LaTRO has good performance: we improve zero-shot accuracy by an average of 12.5% over 3 different base models: Phi-3.5-mini, Mistral-7B, and Llama-3.1-8B.
- LaTRO is reward model-free: Surprisingly but reasonable, the log probabilities of producing the correct answer after the reasoning trajectory serves as a natural reward function, which we call "Self-rewarding"
- LaTRO shifts the inference-time scaling back to training time, by self-generating multiple reasoning trajectories during each training update.
- Free benefit: one can compress the length of reasoning trajectories via LaTRO - on GSM8K, a model with 200 reasoning tokens achieves 78% performance of a model with 500 reasoning tokens.

6/10
@gzlin
Self reflection and correction built in by default.

7/10
@torchcompiled
I kinda relate to it

8/10
@gfodor
omg

9/10
@Z7xxxZ7
Impressed they didn't hide the thinking process and it looks like really more human, rather than o1's thinking process is like a organized ppt.

10/10
@AlexPolygonal
The guy is very cool. Refreshingly honest compared to nuanced RLHF-overfit yes-spammers.
> I'm frustrated.
> i'm really stuck here.
> not helpful.
> I'll have to conclude that finding such an example is beyond my current understanding.
he's literally me for real

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@reach_vb
OH WOW! The Whale aka @deepseek_ai is BACK!! New model, with complete reasoning outputs and a gracious FREE TIER too!

Here's a quick snippet of it searching the web for the right documentation, creating the JS files plus the necessary HTML all whilst handling Auth too

I really hope they Open release the model checkpoints too!

https://video.twimg.com/ext_tw_video/1859198084497973248/pu/vid/avc1/1152x720/BJbtfJOz_9Nawfyo.mp4

2/11
@reach_vb
DeepSeek really said - I think therefore I am.

3/11
@reach_vb
available on DeepSeek

4/11
@TslShahir
It shows the thought process also. Very interesting

5/11
@reach_vb

6/11
@AI_Homelab
Dod they already write smth. if it will be open weights and what size in B Params it has?

7/11
@reach_vb
Not sure, the only information I saw was from @TheXeophon here:

[Quoted tweet]

8/11
@DaniloV50630577
The limit of 50 messages is per day?

9/11
@reach_vb
Yes! But I'm on free-tier so I;m sure you get more quota if you're paid/ API

10/11
@Em
The confetti

11/11
@ScienceGeekAI
@ChomatekLukasz

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@deedydas
Time to take open-source models seriously.

DeepSeek has just changed the game with it's new model — R1-lite.

By scaling test-time compute like o1 but "thinking" even longer (~5mins when I tried), it gets SOTA results on the MATH benchmark with 91.6%!

Go try 50 free queries!

2/11
@deedydas
Playground (turn on DeepThink): DeepSeek

Source:

[Quoted tweet]

DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!

o1-preview-level performance on AIME & MATH benchmarks.

Transparent thought process in real-time.

Open-source models & API coming soon!

Try it now at chat.deepseek.com
#DeepSeek

3/11
@tech_with_jk
The right time to take OS models seriously was yesterday, deedy.
It all started anyway in 2018 and then with 'open'AI :smile:

4/11
@deedydas
None of the open source models of yesterday were truly state of the art. I believe this is the first one that is

5/11
@_akhaliq
nice also try out DeepSeek-V2.5 here: Anychat - a Hugging Face Space by akhaliq, will add R1-lite-preview once its available

6/11
@s10boyal
Answer is 2 right?

7/11
@ai_for_success
Bullish on open source. They'll release the model soon, and an API is also coming

8/11
@AhmedRezaT
Open source has been serious bruh

9/11
@almeida_dril
Ele resolveu aqui.

10/11
@HulsmanZacchary
But is it open source?

11/11
@ayuSHridhar
impressive, but failed at Yann’s test. was expecting o-1 like chain to pass this. Prompt: “7 axles are equally spaced around a circle. A gear is placed on each axle, such that each gear is engaged with a gear to its left and a gear to its right. The gears are numbered from 1 to 7 around the circle. If gear 3 is rotated clockwise, in which direction would gear 7 rotate?”

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/9
@nrehiew_
Rumor is that DeepSeek R1-Lite is a 16B MOE with 2.4B active params

if true, their MATH scores went from 17.1 -> 91.6

[Quoted tweet]
From their wechat announcement:

2/9
@nrehiew_
@zhs05232838 @zebgou @deepseek_ai can you guys confirm?

3/9
@jacobi_torsten
Small models with longer and better thinking will bring us back on track of accelerating performance.

4/9
@Orwelian84
holy shyt - thats nuts - that would run easily on my local hardware

5/9
@gfodor

6/9
@InfusingFit
seems about right, behaves like a small model

7/9
@scaling01
I highly doubt it. Output speed on DeepSeek Chat is way to low for only 2.4B active params - unless they run the model on CPU lol

8/9
@NidarMMV2
Holy shyt

9/9
@k7agar
abundance intelligence is upon us

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 23, 2024

Current AI scaling laws are showing diminishing returns, forcing AI labs to change course | TechCrunch

AI labs traveling the road to super-intelligent systems are realizing they might have to take a detour. "AI scaling laws," the methods and expectations

techcrunch.com

Current AI scaling laws are showing diminishing returns, forcing AI labs to change course

Maxwell Zeff

6:00 AM PST · November 20, 2024

AI labs traveling the road to super-intelligent systems are realizing they might have to take a detour.

“AI scaling laws,” the methods and expectations that labs have used to increase the capabilities of their models for the last five years, are now showing signs of diminishing returns, according to several AI investors, founders, and CEOs who spoke with TechCrunch. Their sentiments echo recent reports that indicate models inside leading AI labs are improving more slowly than they used to.

Everyone now seems to be admitting you can’t just use more compute and more data while pretraining large language models and expect them to turn into some sort of all-knowing digital god. Maybe that sounds obvious, but these scaling laws were a key factor in developing ChatGPT, making it better, and likely influencing many CEOs to make bold predictions about AGI arriving in just a few years.

OpenAI and Safe Super Intelligence co-founder Ilya Sutskever told Reuters last week that “everyone is looking for the next thing” to scale their AI models. Earlier this month, a16z co-founder Marc Andreessen said in a podcast that AI models currently seem to be converging at the same ceiling on capabilities.

But now, almost immediately after these concerning trends started to emerge, AI CEOs, researchers, and investors are already declaring we’re in a new era of scaling laws. “Test-time compute,” which gives AI models more time and compute to “think” before answering a question, is an especially promising contender to be the next big thing.

“We are seeing the emergence of a new scaling law,” said Microsoft CEO Satya Nadella onstage at Microsoft Ignite on Tuesday, referring to the test-time compute research underpinning OpenAI’s o1 model.

He’s not the only one now pointing to o1 as the future.

“We’re now in the second era of scaling laws, which is test-time scaling,” said Andreessen Horowitz partner Anjney Midha, who also sits on the board of Mistral and was an angel investor in Anthropic, in a recent interview with TechCrunch.

If the unexpected success — and now, the sudden slowing — of the previous AI scaling laws tell us anything, it’s that it is very hard to predict how and when AI models will improve.

Regardless, there seems to be a paradigm shift underway: The ways AI labs try to advance their models for the next five years likely won’t resemble the last five.

What are AI scaling laws?

The rapid AI model improvements that OpenAI, Google, Meta, and Anthropic have achieved since 2020 can largely be attributed to one key insight: use more compute and more data during an AI model’s pretraining phase.

When researchers give machine learning systems abundant resources during this phase — in which AI identifies and stores patterns in large datasets — models have tended to perform better at predicting the next word or phrase.

This first generation of AI scaling laws pushed the envelope of what computers could do, as engineers increased the number of GPUs used and the quantity of data they were fed. Even if this particular method has run its course, it has already redrawn the map. Every Big Tech company has basically gone all in on AI, while Nvidia, which supplies the GPUs all these companies train their models on, is now the most valuable publicly traded company in the world.

But these investments were also made with the expectation that scaling would continue as expected.

It’s important to note that scaling laws are not laws of nature, physics, math, or government. They’re not guaranteed by anything, or anyone, to continue at the same pace. Even Moore’s Law, another famous scaling law, eventually petered out — though it certainly had a longer run.

“If you just put in more compute, you put in more data, you make the model bigger — there are diminishing returns,” said Anyscale co-founder and former CEO Robert Nishihara in an interview with TechCrunch. “In order to keep the scaling laws going, in order to keep the rate of progress increasing, we also need new ideas.”

Nishihara is quite familiar with AI scaling laws. Anyscale reached a billion-dollar valuation by developing software that helps OpenAI and other AI model developers scale their AI training workloads to tens of thousands of GPUs. Anyscale has been one of the biggest beneficiaries of pretraining scaling laws around compute, but even its co-founder recognizes that the season is changing.

“When you’ve read a million reviews on Yelp, maybe the next reviews on Yelp don’t give you that much,” said Nishihara, referring to the limitations of scaling data. “But that’s pretraining. The methodology around post-training, I would say, is quite immature and has a lot of room left to improve.”

To be clear, AI model developers will likely continue chasing after larger compute cluster and bigger datasets for pretraining, and there’s probably more improvement to eke out of those methods. Elon Musk recently finished building a supercomputer with 100,000 GPUs, dubbed Colossus, to train xAI’s next models. There will be more, and larger, clusters to come.

But trends suggest exponential growth is not possible by simply using more GPUs with existing strategies, so new methods are suddenly getting more attention.

Test-time compute: The AI industry’s next big bet

When OpenAI released a preview of its o1 model, the startup announced it was part of a new series of models separate from GPT.

OpenAI improved its GPT models largely through traditional scaling laws: more data, more power during pretraining. But now that method reportedly isn’t gaining them much. The o1 framework of models relies on a new concept, test-time compute, so called because the computing resources are used after a prompt, not before. The technique hasn’t been explored much yet in the context of neural networks, but is already showing promise.

Some are already pointing to test-time compute as the next method to scale AI systems.

“A number of experiments are showing that even though pretraining scaling laws may be slowing, the test-time scaling laws — where you give the model more compute at inference — can give increasing gains in performance,” said a16z’s Midha.

“OpenAI’s new ‘o’ series pushes [chain-of-thought] further, and requires far more computing resources, and therefore energy, to do so,” said famed AI researcher Yoshua Bengio in an op-ed on Tuesday. “We thus see a new form of computational scaling appear. Not just more training data and larger models but more time spent ‘thinking’ about answers.”

Over a period of 10 to 30 seconds, OpenAI’s o1 model re-prompts itself several times, breaking down a large problem into a series of smaller ones. Despite ChatGPT saying it is “thinking,” it isn’t doing what humans do — although our internal problem-solving methods, which benefit from clear restatement of a problem and stepwise solutions, were key inspirations for the method.

A decade or so back, Noam Brown, who now leads OpenAI’s work on o1, was trying to build AI systems that could beat humans at poker. During a recent talk, Brown says he noticed at the time how human poker players took time to consider different scenarios before playing a hand. In 2017, he introduced a method to let a model “think” for 30 seconds before playing. In that time, the AI was playing different subgames, figuring out how different scenarios would play out to determine the best move.

Ultimately, the AI performed seven times better than his past attempts.

Granted, Brown’s research in 2017 did not use neural networks, which weren’t as popular at the time. However, MIT researchers released a paper last week showing that test-time compute significantly improves an AI model’s performance on reasoning tasks.

It’s not immediately clear how test-time compute would scale. It could mean that AI systems need a really long time to think about hard questions; maybe hours or even days. Another approach could be letting an AI model “think” through a questions on lots of chips simultaneously.

If test-time compute does take off as the next place to scale AI systems, Midha says the demand for AI chips that specialize in high-speed inference could go up dramatically. This could be good news for startups such as Groq or Cerebras, which specialize in fast AI inference chips. If finding the answer is just as compute-heavy as training the model, the “pick and shovel” providers in AI win again.

The AI world is not yet panicking

Most of the AI world doesn’t seem to be losing their cool about these old scaling laws slowing down. Even if test-time compute does not prove to be the next wave of scaling, some feel we’re only scratching the surface of applications for current AI models.

New popular products could buy AI model developers some time to figure out new ways to improve the underlying models.

“I’m completely convinced we’re going to see at least 10 to 20x gains in model performance just through pure application-level work, just allowing the models to shine through intelligent prompting, UX decisions, and passing context at the right time into the models,” said Midha.

For example, ChatGPT’s Advanced Voice Mode is one the more impressive applications from current AI models. However, that was largely an innovation in user experience, not necessarily the underlying tech. You can see how further UX innovations, such as giving that feature access to the web or applications on your phone, would make the product that much better.

Kian Katanforoosh, the CEO of AI startup Workera and a Stanford adjunct lecturer on deep learning, tells TechCrunch that companies building AI applications, like his, don’t necessarily need exponentially smarter models to build better products. He also says the products around current models have a lot of room to get better.

“Let’s say you build AI applications and your AI hallucinates on a specific task,” said Katanforoosh. “There are two ways that you can avoid that. Either the LLM has to get better and it will stop hallucinating, or the tooling around it has to get better and you’ll have opportunities to fix the issue.”

Whatever the case is for the frontier of AI research, users probably won’t feel the effects of these shifts for some time. That said, AI labs will do whatever is necessary to continue shipping bigger, smarter, and faster models at the same rapid pace. That means several leading tech companies could now pivot how they’re pushing the boundaries of AI.

Large Language Models News & Discussions

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Cerebras Now The Fastest LLM Inference Processor; Its Not Even Close.​

Cerebras Inferencing Llama-3.1 405B​

New Chrome, Safari, Firefox, Edge Warning—Do Not Shop On These Websites​

Don’t Hold Down The Ctrl Key—New Warning As Cyber Attacks Confirmed​

Year’s Strongest Meteor Shower Has Begun: When To See The Geminids At Their Best​

But who needs this level of performance?​

Cerebras and Molecular Dynamics Simulation​

Conclusions​

Veteran

What is a Large Geospatial Model?​

Our work so far​

From Local Systems to Shared Understanding​

Human-Like Understanding​

Towards Complementary Foundation Models​

Veteran

Veteran

Under His Eye

Veteran

Veteran

Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency​

Veteran

Veteran

Veteran

Current AI scaling laws are showing diminishing returns, forcing AI labs to change course​

What are AI scaling laws?​

Test-time compute: The AI industry’s next big bet​

The AI world is not yet panicking​

Cerebras Now The Fastest LLM Inference Processor; Its Not Even Close.

Cerebras Inferencing Llama-3.1 405B

New Chrome, Safari, Firefox, Edge Warning—Do Not Shop On These Websites

Don’t Hold Down The Ctrl Key—New Warning As Cyber Attacks Confirmed

Year’s Strongest Meteor Shower Has Begun: When To See The Geminids At Their Best

But who needs this level of performance?

Cerebras and Molecular Dynamics Simulation

Conclusions

What is a Large Geospatial Model?

Our work so far

From Local Systems to Shared Understanding

Human-Like Understanding

Towards Complementary Foundation Models

Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency

Current AI scaling laws are showing diminishing returns, forcing AI labs to change course

What are AI scaling laws?

Test-time compute: The AI industry’s next big bet

The AI world is not yet panicking