bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276



Google Researchers Can Create an AI That Thinks a Lot Like You After Just a Two-Hour Interview​


After an average of 6,000 words, Stanford and Google researchers can spin up a generative agent that will act a lot like you do.

By Matthew Gault Published January 8, 2025 | Comments (17)

The interface reserachers used to make genrative AI agents.
The interface reserachers used to make genrative AI agents. © Stanford University image.

Stanford University researchers paid 1,052 people $60 to read the first two lines of The Great Gatsby to an app. That done, an AI that looked like a 2D sprite from an SNES-era Final Fantasy game asked the participants to tell the story of their lives. The scientists took those interviews and crafted them into an AI they say replicates the participants’ behavior with 85% accuracy.

The study, titled Generative Agent Simulations of 1,000 People, is a joint venture between Stanford and scientists working for Google’s DeepMind AI research lab. The pitch is that creating AI agents based on random people could help policymakers and business people better understand the public. Why use focus groups or poll the public when you can talk to them once, spin up an LLM based on that conversation, and then have their thoughts and opinions forever? Or, at least, as close an approximation of those thoughts and feelings as an LLM is able to recreate.

“This work provides a foundation for new tools that can help investigate individual and collective behavior,” the paper’s abstract said.

“How might, for instance, a diverse set of individuals respond to new public health policies and messages, react to product launches, or respond to major shocks?” The paper continued. “When simulated individuals are combined into collectives, these simulations could help pilot interventions, develop complex theories capturing nuanced causal and contextual interactions, and expand our understanding of structures like institutions and networks across domains such as economics, sociology, organizations, and political science.”

All those possibilities based on a two-hour interview fed into an LLM that answered questions mostly like their real-life counterparts.

Much of the process was automated. The researchers contracted Bovitz, a market research firm, to gather participants. The goal was to get a wide sample of the U.S. population, as wide as possible when constrained to 1,000 people. To complete the study, users signed up for an account in a purpose-made interface, made a 2D sprite avatar, and began to talk to an AI interviewer.

The questions and interview style are a modified version of that used by the American Voices Project, a joint Stanford and Princeton University project that’s interviewing people across the country.

Each interview began with the participants reading the first two lines of The Great Gatsby (“In my younger and more vulnerable years my father gave me some advice that I’ve been turning over in my mind ever since. ‘Whenever you feel like criticizing any one,’ he told me, ‘just remember that all the people in this world haven’t had the advantages that you’ve had.’”) as a way to calibrate the audio.

According to the paper, “The interview interface displayed the 2-D sprite avatar representing the interviewer agent at the center, with the participant’s avatar shown at the bottom, walking towards a goal post to indicate progress. When the AI interviewer agent was speaking, it was signaled by a pulsing animation of the center circle with the interviewer avatar.”

The two-hour interviews, on average, produced transcripts that were 6,491 words in length. It asked questions about race, gender, politics, income, social media use, the stress of their jobs, and the makeup of their families. The researchers published the interview script and questions the AI asked.

Those transcripts, less than 10,000 words each, were then fed into another LLM that the researchers used to spin up generative agents meant to replicate the participants. Then researchers put both the participants and AI clones through more questions and economic games to see how they’d compare. “When an agent is queried, the entire interview transcript is injected into the model prompt, instructing the model to imitate the person based on their interview data,” the paper said.

This part of the process was as close to controlled as possible. Researchers used the General Social Survey (GSS) and the Big Five Personality Inventory (BFI) to test how well the LLMs matched their inspiration. It then ran participants and the LLMs through five economic games to see how they’d compare.

Results were mixed. The AI agents answered about 85% of the questions the same way as the real-world participants on the GSS. They hit 80% on the BFI. The numbers plummeted when the agents started playing economic games, however. The researchers offered the real-life participants cash prizes to play games like the Prisoner’s Dilemma and The Dictator’s Game.

In the Prisoner’s Dilemma, participants can choose to work together and both succeed or screw over their partner for a chance to win big. In the Dictator’s Game, the participants have to choose how to allocate resources to other participants. The real-life subjects earned money over the original $60 for playing these.

Faced with these economic games, the AI clones of the humans didn’t replicate their real-world counterparts as well. “On average, the generative agents achieved a normalized correlation of 0.66,” or about 60%.

The entire document is worth reading if you’re interested in how academics are thinking about AI agents and the public. It did not take long for researchers to boil down a human being’s personality into an LLM that behaved similarly. Given time and energy, they can probably bring the two closer together.

This is worrying to me. Not because I don’t want to see the ineffable human spirit reduced to a spreadsheet, but because I know this kind of tech will be used for ill. We’ve already seen stupider LLMs trained on public recordings tricking grandmothers into giving away bank information to an AI relative after a quick phone call. What happens when those machines have a script? What happens when they have access to purpose-built personalities based on social media activity and other publicly available information?

What happens when a corporation or a politician decides the public wants and needs something based not on their spoken will, but on an approximation of it?
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276

1/1
@nexusfusion_io
$450 open-source reasoning model Sky-T1-32B-Preview from UC Berkeley's NovaSky team rivals OpenAI's o1. With 19 hours of training, this marks a major milestone in cost-effective AI development! 🌟 Read more: Sky-T1: Train your own O1 preview model within $450




1/2
@uavster
You can now train a model that beats OpenAI's o1 in math and coding for less than $450.

Meet UC Berkeley’s Sky-T1–32B-Preview. Link in 🧵



2/2
@uavster
Code and data are open.
GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450










1/11
@AIBuzzNews
Meet the model trained for just $450.

And it rivals OpenAI's best.

Here’s the story behind Sky-T1-32B-Preview:



GhJsdhrXAAAjvCr.jpg


2/11
@AIBuzzNews
Meet Sky-T1-32B-Preview from @NovaSkyAI, an open-source reasoning model that stands toe-to-toe with o1-preview on leading reasoning and coding benchmarks.

The best part? This model was trained for just $450.



GhJsdgeXIAAMMgM.jpg


3/11
@AIBuzzNews
Sky-T1-32B-Preview outperforms on key benchmarks:

Math500: 82.4% (vs. 81.4% by o1-preview)
AIME24: 43.3% (vs. 40.0%)
LiveCodeBench-Hard: 17.9% (vs. 16.3%)



GhJsdg0XUAAJys6.jpg


4/11
@AIBuzzNews
Here’s how it was trained:

- Base model: Qwen2.5-32B-Instruct
- Data: Sourced from QwQ-32B, enhanced with GPT-4o-mini and reject sampling for precise math and coding traces
- Compute: 8 H100 GPUs, 19 hours, and $450 in cost



5/11
@AIBuzzNews
Sky-T1-32B-Preview is just the start. They're working on:

- More efficient reasoning models
- Advanced methods for scaling during inference
Stay tuned!



6/11
@mushfiq_sajib
Impressive efficiency in training Sky-T1-32B-Preview. Innovations like this redefine AI possibilities within tight budgets.



7/11
@AIBuzzNews
This opens up many doors for businesses to train their own models.



8/11
@shushant_l
Wow, you've explained in detail



9/11
@AIBuzzNews
Trying to make it understandable for everyone.



10/11
@Whizz_ai
Do you think that these new Rivals will have potential to compete Open AI?



11/11
@AIBuzzNews
Well, it does according to the benchmarks.




1/11
@victormustar
Reasoning traces are the new gold, and the open-source community is going to nail this. Check Sky-T1-32B-Preview release (reportedly rivals o1-preview for coding). The team has fully disclosed all technical details, code, dataset, and weights. 🔥

NovaSky-AI/Sky-T1-32B-Preview · Hugging Face



2/11
@victormustar
Sky-T1: Train your own O1 preview model within $450



3/11
@ivanfioravanti
what??? This rivals o1-preview? TOP TOP TOP!



4/11
@PascalBauerDE
Model Description
This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.

Omg. What? That is like next leve. 17k data only.



5/11
@TheAIVeteran
I know how to generate reasoning traces from scratch. See my pinned thread for some details.



6/11
@Teknium1
The datasets listed dont seem to be the subsets used to train this fyi



7/11
@sinanisler
it is just matter of time we have o1 level opensource model and maybe even under 32b



8/11
@anushkmittal
reasoning traces are the new moat



9/11
@carsenklock
Amazing!! GG



10/11
@AbelIonadi
Will try this. Sounds interesting



11/11
@9Knowled9e
🔥





1/11
@reach_vb
Sky-T1-32B-Preview, open source O1 like model trained for < 450$, achieves competitive reasoning and coding performance (e.g., 82.4 on Math500, 86.3 on LiveCode-East) compared to QwQ (85.4, 90.7) and o1-preview (81.4, 92.9) 🔥

Fully open-source with 17K training data, 32B model weights, and outperforming Qwen-2.5-32B-Instruct across benchmarks 💥



GhB38aVWIAAjngm.jpg


2/11
@reach_vb
Model checkpoints:

NovaSky-AI/Sky-T1-32B-Preview · Hugging Face



3/11
@InfSoftwareH
Has it been trained only on the benchmarks data?😂



4/11
@reach_vb
The best part is that anyone can quite easily test this with a less than 450USD :smile:



5/11
@ichrvk
Would love to see how these benchmarks hold up in real-world scenarios. The training cost is fascinating though - we're truly entering the era of bedroom LLMs.



6/11
@StephenEdginton
It’s a finetune should really say that still impressive.



7/11
@steve_ike_
How is this possible 🤯.



8/11
@rogue_node
it's a finetuned model .



9/11
@dzamsgaglo
Does somebody compare it to Phi-4 ?



10/11
@prithiv_003
This is awesome in every aspects, less than 450$, less than 50k TD Just Nice Work 🤩



11/11
@SynthSquid
I wanna see this tested on Aider's new benchmark





1/3
@iamluokai
The NovaSky team fine-tuned the open-source Qwen2.5-32B-Instruct model. The training lasted for 19 hours using 8 H100 GPUs, costing about $450 (priced according to Lambda Cloud). The resulting Sky-T1-32B-Preview model performs comparably to o1-preview in reasoning and coding benchmarks, demonstrating the possibility of efficiently replicating high-level reasoning capabilities at a low cost. 🧵1/3



GhGM_E9bEAAd__X.jpg


2/3
@iamluokai
🧵2/3

The NovaSky team has open-sourced all the details of the model (including data, code, model weights, etc.), making it easy for community members to replicate and improve the results.

Project: Sky-T1: Train your own O1 preview model within $450



3/3
@iamluokai
🧵3/3

Github: GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450





1/7
@abacaj
This is just standard SFT and outperforms o1-preview? Questionable…

[Quoted tweet]
1/6 🚀
Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450!

📊Blog: novasky-ai.github.io/posts/s…
🏋️‍♀️Model weights: huggingface.co/NovaSky-AI/Sk…
[media=twitter]1877793041957933347[/media]

Gg9Azj5a0AAElZU.jpg


2/7
@willccbb
QwQ is already an open source 32B model which outperforms o1-preview in many benchmarks and was finetuned from Qwen2.5-32B

they just kinda did QwQ again but mostly worse, using QwQ data



3/7
@abacaj
Yea I feel like it’s not that interesting but maybe I’m missing something



4/7
@snellingio
i think the thing you’re both “missing” is that the data is available and reproducible (hopefully)

having that dataset available is great imo



5/7
@willccbb
totally fair, missed that bit

will be cool to see how it translates for smaller models

i suspect that you should be to get a really good code/math reasoner at like 7b with these kinds of tricks



6/7
@starkov100
Sky-T1: Train your own O1 preview model within $450



GhEwLj4WkAAvrOj.jpg


7/7
@snellingio
yeah but they just used vanilla SFT from what I can tell.

am not convinced that SFT only will be successful in small models with this kind of data (it obviously wasn't in this case)








1/21
@NovaSkyAI
1/6 🚀
Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450!

📊Blog: Sky-T1: Train your own O1 preview model within $450
🏋️‍♀️Model weights: NovaSky-AI/Sky-T1-32B-Preview · Hugging Face



Gg9Azj5a0AAElZU.jpg


2/21
@NovaSkyAI
2/6📂
Data curation, train, eval code, 17K training data: GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450

Collaborate, replicate, and innovate! 💡



3/21
@NovaSkyAI
3/6📈
Sky-T1-32B-Preview excels in both math & coding:
- Math500: 82.4% (o1-preview: 81.4%)
- AIME24: 43.3% (o1-preview: 40.0%)
- LiveCodeBench-Hard: 17.9% (o1-preview: 16.3%)



4/21
@NovaSkyAI
4/6 ⚙️
The training recipe:
- Base: Qwen2.5-32B-Instruct
- Data: Curated from QwQ-32B, enhanced with GPT-4o-mini, reject sampling for high-quality math & coding reasoning traces.
- Cost: 8 H100 GPUs, 19 hours, $450.



Gg9B6sXboAAUFJN.jpg


5/21
@NovaSkyAI
5/6🌟
Sky-T1-32B-Preview is just the beginning! Next steps:
- Efficient models with strong reasoning
- Explore advanced techniques for test-time scaling



6/21
@NovaSkyAI
6/6 Acknowledgements:

Built with support from: @LambdaAPI @anyscalecompute for compute
Academic Insights from STILL-2 & Qwen Teams

💻 Built at Berkeley’s Sky Computing Lab @BerkeleySky with the amazing NovaSky team:
Contact: novasky.berkeley@gmail.com!



7/21
@ruansgon
@UnrollHelper



8/21
@nooriefyi
the future of ai is collaborative



9/21
@chillzaza_
long live open source



10/21
@Kitora_Su
Congratulations on this amazing feat to the team.



11/21
@DmitriyAnderson
Can I run it on RTX 4090?



12/21
@therealmrcrypto
@Bobtoshi69



13/21
@Cyril_Engineer
Can this be run locally and how much VRAM does it require?



14/21
@Gopinath876
@MaziyarPanahi any thoughts on this models?

I tested it locally doing really.



15/21
@steve_ike_
Matches o-1 preview and trained under $450 don’t make sense together! 😂



16/21
@iamRezaSayar
This is very cool!🔥but I'm a bit confused on why you chose to fine-tune Qwen2.5 instead of QwQ, given that both are the same size, and even as awesome a jump in performance that we see here, they still seem to fall short of QwQ. So, was there a reason you didn't go with QwQ? 👀



17/21
@altryne
What is this madness :smile:

Will mention this in the next @thursdai_pod 👏

Welcome to come tell us about it!



18/21
@Yuchenj_UW
Huge if it’s not trained on the test set



19/21
@TechMemeKing
Insane



20/21
@jasonkneen
LFG!!



21/21
@nisten
at first i was like.. meh just a QwQ finetune but then... i realized you trained this off of Q32 Instruct 👀
holy cow ok, gonna try this out
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276
















1/15
@TheAITimeline
🚨This week's top AI/ML research papers:

- Towards System 2 Reasoning in LLMs
- Agent Laboratory
- rStar-Math
- From System-1 Thinking to System-2 Thinking
- The GAN is dead; long live the GAN! A Modern GAN Baseline
- Search-o1
- REINFORCE++
- Enhancing Human-Like Responses in Large Language Models
- An Empirical Study of Autoregressive Pre-training from Videos
- Scaling Laws for Floating Point Quantization Training
- Cosmos World Foundation Model Platform for Physical AI
- Personalized Graph-Based Retrieval for Large Language Models
- Entropy-Guided Attention for Private LLMs
- Adjoint Matching
- Grokking at the Edge of Numerical Stability
- Key-value memory in the brain
- mFabric
- Titans: Learning to Memorize at Test Time
- Tensor-GaLoren
- BoostStep

overview for each + authors' explanations
read this in thread mode for the best experience



GhIj8NGWIAAFgAn.jpg


2/15
@TheAITimeline
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

Author's Explanation:


Overview:
Meta Chain-of-Thought (Meta-CoT) introduces a framework that enhances traditional reasoning in LLMs by explicitly modeling the reasoning processes, using process supervision, synthetic data, and search algorithms to produce Meta-CoTs, supported by instruction tuning and reinforcement learning.

Paper:
[2501.04682] Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

[Quoted tweet]
We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.
[media=twitter]1877446475271037314[/media]

GhIkAPWXsAAZK1J.jpg

Gg3rVxaaMAIJB_b.jpg


3/15
@TheAITimeline
Agent Laboratory: Using LLM Agents as Research Assistants

Author's Explanation:


Overview:
Agent Laboratory introduces a framework harnessing LLMs to automate the research process, achieving state-of-the-art results and reducing costs by 84% while allowing human feedback to enhance research quality throughout literature review, experimentation, and report writing stages.

Paper:
[2501.04227] Agent Laboratory: Using LLM Agents as Research Assistants

[Quoted tweet]
🚀🔬 Introducing Agent Laboratory: an assistant for automating machine learning research

Agent Laboratory takes your research ideas and outputs a research paper and code repository, allowing you to allocate more effort toward ideation rather than low-level coding and writing 🧵
[media=twitter]1877164749668102233[/media]

GhIkBwPWcAAoUqz.jpg

Ggz0W8yWMAACKtr.jpg


4/15
@TheAITimeline
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Overview:
rStar-Math demonstrates that small language models can achieve or exceed the math reasoning skills of OpenAI o1 by implementing Monte Carlo Tree Search to conduct "deep thinking" during test-time search, and introduces a new method for training policy SLMs and process reward models without relying on distillation.

Through innovative data synthesis and self-evolution processes, rStar-Math achieves state-of-the-art performance, improving math reasoning on benchmarks like MATH, boosting Qwen2.5-Math-7B to 90.0% and achieving high performance on USA Math Olympiad problems, ranking in the top 20% of high school math contestants.

Paper:
[2501.04519] rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking



GhIkDqAX0AAfRe9.jpg


5/15
@TheAITimeline
Test-time Computing: from System-1 Thinking to System-2 Thinking

Overview:
This paper investigates the evolution of test-time computing from System-1 to System-2 thinking, emphasizing its role in enhancing reasoning capabilities through methods like parameter updating and tree search, tracing how it addresses distribution shifts in System-1 models and bolsters complex reasoning in System-2 models.

Paper:
[2501.02497] Test-time Computing: from System-1 Thinking to System-2 Thinking



GhIkFY3XYAAEy-K.jpg


6/15
@TheAITimeline
The GAN is dead; long live the GAN! A Modern GAN Baseline

Overview:
This paper challenges the notion that GANs are inherently difficult to train by introducing a regularized relativistic GAN loss, which circumvents mode dropping and non-convergence without relying on empirical tricks, leading to a new minimalist baseline called R3GAN.

By replacing outdated architectures and eliminating ad-hoc tricks, R3GAN outperforms StyleGAN2 on datasets like FFHQ, ImageNet, CIFAR, and Stacked MNIST, and competes effectively against the latest GANs and diffusion models.

Paper:
[2501.05441] The GAN is dead; long live the GAN! A Modern GAN Baseline



GhIkHi4XQAAE3Dg.jpg


7/15
@TheAITimeline
Search-o1: Agentic Search-Enhanced Large Reasoning Models

Overview:
Search-o1 introduces an agentic retrieval-augmented generation mechanism and a Reason-in-Documents module to enhance large reasoning models (LRMs) by dynamically accessing and refining external knowledge during reasoning tasks.

This integration reduces knowledge insufficiency and errors in LRMs by allowing targeted retrieval and deep analysis of external information.

The framework's effectiveness is validated through extensive testing on complex reasoning tasks and several open-domain QA benchmarks, significantly improving the trustworthiness and versatility of LRMs.

Paper:
[2501.05366] Search-o1: Agentic Search-Enhanced Large Reasoning Models



GhIkLf5XkAASTpB.jpg


8/15
@TheAITimeline
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Overview:
REINFORCE++ enhances the classical REINFORCE algorithm by integrating optimization techniques from PPO to improve training stability and computational efficiency without requiring a critic network, and it surpasses GRPO in stability while being more efficient than PPO with similar performance levels.

Paper:
[2501.03262] REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models



GhIkNcBWAAE5gqJ.jpg


9/15
@TheAITimeline
Enhancing Human-Like Responses in Large Language Models

Author's Explanation:


Overview:
Enhancements in LLMs, such as fine-tuning with diverse datasets and integrating psychological principles, significantly improve natural language understanding, conversational coherence, and emotional intelligence, which refines user interactions.

Paper:
[2501.05032] Enhancing Human-Like Responses in Large Language Models

[Quoted tweet]
🚀 I am excited to unveil our new paper, "Enhancing Human-Like Responses in Large Language Models"!

🤖✨ We've developed synthetic datasets and models that significantly improve conversational AI, making interactions more human-like.

Check it out here: huggingface.co/papers/2501.0…
[media=twitter]1877763008257986846[/media]

GhIkRAwXQAAp0MU.jpg


10/15
@TheAITimeline
An Empirical Study of Autoregressive Pre-training from Videos

Author's Explanation:


Overview:
This paper investigates autoregressive pre-training from videos using a series of models, called Toto, which treat videos as sequences of visual tokens for predictive tasks.

By training on over 1 trillion visual tokens, the study evaluates the performance on tasks like image recognition and video classification, revealing that autoregressive methods with minimal biases achieve competitive results.

Scaling these video models shows similar patterns to LLMs, although at a different rate.

Paper:
[2501.05453] An Empirical Study of Autoregressive Pre-training from Videos

[Quoted tweet]
An Empirical Study of Autoregressive Pre-training from Videos.

paper: arxiv.org/pdf/2501.05453
website: brjathu.github.io/toto

We empirically study autoregressive pre-training from videos. Our models are pre-trained on a diverse dataset of videos and images comprising over 1 trillion visual tokens. We evaluate the learned visual representations on a range of downstream tasks including image recognition, video classification, forecasting, tracking, and robotics.
[media=twitter]1877551853506003297[/media]

GhIkUtsWcAAuC71.jpg

Gg5jH-baMAU_pu8.jpg


11/15
@TheAITimeline
Scaling Laws for Floating Point Quantization Training

Author's Explanation:


Overview:
This paper investigates floating-point quantization in low-precision LLM training, revealing that exponent bits slightly outweigh mantissa bits for performance, with an optimal exponent-mantissa bit ratio identified for various bit settings.

It introduces a unified scaling law, notes the existence of a critical data size where excess data degrades performance, and suggests that optimal quantization precision, beneficial for cost-performance, ranges from 4-8 bits depending on computational power.

Paper:
[2501.02423] Scaling Laws for Floating Point Quantization Training

[Quoted tweet]
Our latest research revolutionizes the efficiency of Large Language Models (LLMs) with a new "Scaling Laws for Floating-Point Quantization Training" study. Delved deep into the impact of exponent bits, mantissa bits, and block-size on model training loss. arxiv.org/abs/2501.02423
[media=twitter]1876929995425079410[/media]

GhIkW2jXcAA0Xju.jpg


12/15
@TheAITimeline
Cosmos World Foundation Model Platform for Physical AI

Author's Explanation:


Overview:
The Cosmos World Foundation Model Platform enables the creation of customized world models for Physical AI by providing a general-purpose foundation model that can be fine-tuned for specific applications through its video curation pipeline, pre-trained models, post-training examples, and video tokenizers.

Paper:
[2501.03575] Cosmos World Foundation Model Platform for Physical AI

[Quoted tweet]
We are releasing Cosmos, a world foundation model platform for building and advancing Physical AI. It offers diffusion and autoregressive pretrained models ranging from 4B to 14B.

🔧Try it, finetune it, & explore: github.com/NVIDIA/Cosmos.

🔍Dive into our 75-page tech report for all the details! research.nvidia.com/publicat…
[media=twitter]1876695479419076816[/media]

GhIkYqIWMAA0jsP.jpg


13/15
@TheAITimeline
Personalized Graph-Based Retrieval for Large Language Models

Overview:
Personalized Graph-based Retrieval-Augmented Generation (PGraphRAG) leverages user-centric knowledge graphs to enhance LLMs' personalized response quality by enriching prompts with relevant context, outperforming existing methods in cold-start scenarios; it is evaluated through a benchmark tailored for sparse user data environments.

Paper:
[2501.02157] Personalized Graph-Based Retrieval for Large Language Models



GhIkaUVXMAAgyUq.jpg


14/15
@TheAITimeline
Entropy-Guided Attention for Private LLMs

Overview:
Entropy-Guided Attention addresses privacy concerns in proprietary LLMs by utilizing a framework that examines nonlinearities in decoder-only models through Shannon's entropy.
By proposing an attention mechanism and entropy regularization to maintain model stability and attention head diversity, it offers PI-friendly layer normalization alternatives to enhance training efficiency amidst reduced nonlinearity.

Paper:
[2501.03489] Entropy-Guided Attention for Private LLMs



GhIkcftXUAEj3Lx.jpg


15/15
@TheAITimeline
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

Author's Explanation:


Overview:
Adjoint Matching introduces a memoryless noise schedule in fine-tuning flow and diffusion generative models using stochastic optimal control (SOC), transforming SOC challenges into regression tasks.

This technique notably surpasses prior methods by enhancing consistency, realism, and the adaptability of models to new human preference rewards, without sacrificing sample diversity.

Paper:
[2409.08861] Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

[Quoted tweet]
New paper! We cast reward fine-tuning as stochastic control.

1. We prove that a specific noise schedule *must* be used for fine-tuning.

2. We propose a novel algorithm that is significantly better than the adjoint method*.

(*this is an insane claim)

arxiv.org/abs/2409.08861
[media=twitter]1836065903533785588[/media]

GhIkfmCXMAAn6Sn.jpg

GXsB-RKXwAAj-2D.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276
[threads continued]








1/7
@TheAITimeline
Grokking at the Edge of Numerical Stability

Author's Explanation:


Overview:
Grokking, the sudden generalization observed after extensive overfitting, is hindered by the Softmax Collapse (SC), which occurs due to numerical instability and the alignment of gradients with na\"ive loss minimization (NLM) after overfitting.

This paper introduces StableMax, an activation function, and $\perp$Grad, a training algorithm, both of which mitigate SC and facilitate grokking without regularization, thereby explaining delayed generalization and offering solutions to improve learning resilience against SC.

Paper:
[2501.04697] Grokking at the Edge of Numerical Stability

[Quoted tweet]
Our latest work [arxiv.org/abs/2501.04697] explores sudden generalization in neural nets, aka #grokking. We identify & propose solutions to two key issues hindering grokking: (i) floating point errors in Softmax and (ii) aligned gradients naively scaling logits post-overfitting.👇
[media=twitter]1877444283151651117[/media]

GhIkhfKWwAAxqLD.jpg

Gg39tzsXMAAFBkd.jpg


2/7
@TheAITimeline
Key-value memory in the brain

Overview:
Key-value memory systems optimize fidelity and discriminability by using distinct representations for storage and retrieval, a concept rooted in both traditional psychological models and modern machine learning, with potential implications for understanding biological memory mechanisms and addressing empirical puzzles.

Paper:
[2501.02950v1] Key-value memory in the brain



GhIkkSnW8AA2JPH.jpg


3/7
@TheAITimeline
mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training

Overview:
mFabric introduces an innovative system for mixture-of-experts (MoE) training by enabling topology reconfiguration during distributed training, leveraging a regionally reconfigurable high-bandwidth domain through optical circuit switching.

This approach maintains efficiency and adaptability, with simulations indicating mFabric matches non-blocking fat-tree performance while enhancing training cost efficiency by up to 2.3 times at higher bandwidths.

Paper:
[2501.03905] mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training



GhIkmOjWwAAVEEC.jpg


4/7
@TheAITimeline
Titans: Learning to Memorize at Test Time

Overview:
Titans introduces a neural long-term memory module that enhances attention mechanisms by memorizing historical context, allowing efficient handling of extensive context windows with fast parallelizable training and inference.

The architecture efficiently combines short-term attention with long-term memory, outperforming Transformers and linear recurrent models in language modeling, common-sense reasoning, genomics, and time series tasks.

It achieves superior accuracy on large context windows, particularly in needle-in-haystack scenarios, demonstrating its capability in effectively incorporating memory into neural architectures.

Paper:
[2501.00663] Titans: Learning to Memorize at Test Time



GhIkoJqX0AAHVFt.jpg


5/7
@TheAITimeline
Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

Overview:
Tensor-GaLore introduces a memory-efficient training method for neural networks with higher-order tensor weights, using tensor gradient decomposition to optimize complex tensor-parameterized layers, achieving up to 75% memory savings in models like Fourier Neural Operators for PDE tasks such as Navier Stokes and Darcy Flow equations.

Paper:
[2501.02379] Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition



GhIkr4PWwAARORK.jpg


6/7
@TheAITimeline
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Overview:
BoostStep enhances mathematical reasoning in LLMs by aligning granularities between retrieval and reasoning steps, utilizing a novel 'first-try' strategy to deliver more relevant in-context learning examples, and when integrated with Monte Carlo Tree Search, it achieves significant improvements in performance, including a 3.6% and 2.0% increase for models like GPT-4o and Qwen2.5-Math-72B.

Paper:
[2501.03226] BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning



GhIkt2sXUAAHzMY.jpg


7/7
@TheAITimeline
That's a wrap for this week, thanks for reading!
Remember to drop a follow @TheAITimeline & rt if you like it!

You can see a few in-depth explanation in my next few issues, stay tuned here:
The AI Timeline
Have a great start of your week!

[Quoted tweet]
🚨This week's top AI/ML research papers:

- Towards System 2 Reasoning in LLMs
- Agent Laboratory
- rStar-Math
- From System-1 Thinking to System-2 Thinking
- The GAN is dead; long live the GAN! A Modern GAN Baseline
- Search-o1
- REINFORCE++
- Enhancing Human-Like Responses in Large Language Models
- An Empirical Study of Autoregressive Pre-training from Videos
- Scaling Laws for Floating Point Quantization Training
- Cosmos World Foundation Model Platform for Physical AI
- Personalized Graph-Based Retrieval for Large Language Models
- Entropy-Guided Attention for Private LLMs
- Adjoint Matching
- Grokking at the Edge of Numerical Stability
- Key-value memory in the brain
- mFabric
- Titans: Learning to Memorize at Test Time
- Tensor-GaLoren
- BoostStep

overview for each + authors' explanations
read this in thread mode for the best experience
[media=twitter]1878604721650585830[/media]

GhIj8NGWIAAFgAn.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276
















1/15
@TheAITimeline
🚨This week's top AI/ML research papers:

- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- 1.58-bit FLUX
- Memory Layers at Scale
- Agents Are Not Enough
- LTX-Video: Realtime Video Latent Diffusion
- HUNYUANPROVER
- 2 OLMo 2 Furious
- Jasper and Stella: distillation of SOTA embedding models
- Multi-matrix Factorization Attention
- Can Large Language Models Adapt to Other Agents In-Context?
- TangoFlux
- KaLM-Embedding
- Slow Perception: Let's Perceive Geometric Figures Step-by-step
- LLM-as-an-Interviewer
- Vision Language Models See Illusions Where There are None
- Multi-matrix Factorization Attention
- Rate of Model Collapse in Recursive Training
- LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
- Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
- Process Reinforcement through Implicit Rewards

overview for each + authors' explanations
read this in thread mode for the best experience



Ggk4SA9WcAAj6Iw.jpg


2/15
@TheAITimeline
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Author's Explanation:


Overview:
This paper investigates the overthinking issue in o1-like LLMs, which emulate human-like reasoning through extended chain-of-thought processes but often waste computational resources on simple problems.

Novel efficiency metrics are introduced to assess resource utilization, and a self-training paradigm is proposed to streamline reasoning while maintaining accuracy.

Experimental results demonstrate that the approach reduces computational overhead without degrading performance across benchmarks, including GSM8K, MATH500, and GPQA.

Paper:
[2412.21187] Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

[Quoted tweet]
Are we scaling test-time compute efficiently and intelligently?

Introducing a comprehensive study on the prevalent issue of overthinking in o1-like models, where excessive computational resources are allocated for simple problems with minimal benefit.

🪡Across extensive analyses of mathematical benchmarks, we found these overthinking patterns:
(1) contribute minimally to improving accuracy,
(2) lack diversity in reasoning strategies,
(3) occur more frequently with simple problems.

🪡 We introduce novel efficiency metrics from both outcome and process perspectives to evaluate the rational use of computational resources by o1-like models.

🪡 Using a self-training paradigm, we propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.

Paper: arxiv.org/abs/2412.21187 🧵
[media=twitter]1873924882012291463[/media]

Ggk4dx1WkAA13-R.jpg

GgGEsAmasAEmeEm.jpg


3/15
@TheAITimeline
1.58-bit FLUX

Overview:
1.58-bit FLUX introduces a novel quantization method for the FLUX.1-dev text-to-image model, achieving comparable image generation quality with 1.58-bit weights while using only self-supervision.

This approach reduces model storage by 7.7x, inference memory by 5.1x, and improves latency through a custom 1.58-bit kernel, maintaining performance across benchmarks like GenEval and T2I Compbench.

Paper:
[2412.18653] 1.58-bit FLUX



Ggk4f11XwAAZBmq.jpg


4/15
@TheAITimeline
Memory Layers at Scale

Author's Explanation:


Overview:
Memory layers employ a trainable key-value lookup to efficiently add parameters without increasing FLOPs, complementing dense feed-forward layers by offering enhanced storage and retrieval capabilities at low computational cost.

This study scales memory layers, showing they outperform dense and mixture-of-experts models in language tasks, especially factual ones, achieving superior performance with significantly lower computation.

The authors provide a parallelizable implementation, demonstrating effective scaling with up to 128 billion memory parameters and pretraining on 1 trillion tokens, compared to base models with up to 8 billion parameters.

Paper:
[2412.09764] Memory Layers at Scale

[Quoted tweet]
New research from Meta FAIR — Meta Memory Layers at Scale. This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale ➡️ go.fb.me/3lbt4m
[media=twitter]1874897646542033030[/media]

Ggk4hx-W4AATrmx.jpg


https://video.twimg.com/ext_tw_video/1874897089085489152/pu/vid/avc1/1920x1080/3_k5IgzTe3YlQhM4.mp4

5/15
@TheAITimeline
Agents Are Not Enough

Author's Post:


Overview:
The paper explores the limitations of relying solely on agents in AI systems and argues for a holistic ecosystem that includes Sims to model user preferences and Assistants to interact and coordinate tasks, suggesting that generative AI alone is insufficient for creating effective and sustainable agent systems.

Paper:
[2412.16241] Agents Are Not Enough

[Quoted tweet]
See why @ryen_white and I think "Agents Are Not Enough" arxiv.org/abs/2412.16241
[media=twitter]1871400497292435459[/media]

Ggk4jmZXUAAUjRl.jpg


6/15
@TheAITimeline
LTX-Video: Realtime Video Latent Diffusion

Author's Explanation:


Overview:
LTX-Video introduces a unified latent diffusion model for video generation, integrating Video-VAE and a denoising transformer to optimize efficiency and quality.

By relocating patchifying to the VAE's input, it achieves a 1:192 compression ratio, enabling full spatiotemporal self-attention for high-resolution, temporally consistent videos.

A dual-purpose VAE decoder enhances fine detail without separate upsampling, supporting text-to-video and image-to-video tasks with simultaneous training.

LTX-Video generates 5 seconds of 768x512 video at 24 fps in 2 seconds, surpassing existing models in speed and quality.

Paper:
[2501.00103] LTX-Video: Realtime Video Latent Diffusion

[Quoted tweet]
LTX-Video Paper Release 🚀

1/ We’re thrilled to release our LTX-Video paper! 🎉
What makes LTX-Video so much faster than other video generation models? 🤔
The answer lies in our novel design choices, now explained in our just-released paper: arxiv.org/abs/2501.00103.
A 🧵:
[media=twitter]1875148348489113891[/media]

Ggk4mPcXMAAxqA5.jpg


7/15
@TheAITimeline
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving

Overview:
HunyuanProver, a finetuned Hunyuan 7B model, excels in interactive theorem proving with LEAN4 through scalable data synthesis and guided tree search algorithms.

It achieves SOTA performance, with a 68.4% pass rate on miniF2F-test, surpassing the previous 65.9%, and proves 4 IMO statements.

The open-sourced dataset of 30k synthesized instances includes questions, autoformalized statements, and proofs.

Paper:
[2412.20735] HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving



Ggk4qL2WoAAD6S3.jpg


8/15
@TheAITimeline
2 OLMo 2 Furious

Author's Explanation:


Overview:
OLMo 2 introduces advanced dense autoregressive language models with architectural and training improvements, leveraging the specialized Dolmino Mix 1124 data mix for enhanced downstream task performance via late-stage curriculum training.

The models often outperform comparable open-weight models like Llama 3.1 and Qwen 2.5, achieving high efficiency with fewer FLOPs.

Paper:
[2501.00656] 2 OLMo 2 Furious

[Quoted tweet]
Everyone wants open-source language models but no one wants to lift these heavy ass weights.

We just released our paper "2 OLMo 2 Furious"
Can't stop us in 2025. Links below.
[media=twitter]1875258175047471283[/media]

Ggk4sSuWQAAFVd5.jpg

GgZAtqEa8AIwwXE.jpg


9/15
@TheAITimeline
Jasper and Stella: distillation of SOTA embedding models

Overview:
This paper introduces techniques for embedding model distillation to create smaller, efficient models by leveraging a novel training approach to reduce vector dimensions and aligning image-text data to develop a multimodal encoder, achieving strong/SOTA performance on benchmarks like MTEB.

Paper:
[2412.19048] Jasper and Stella: distillation of SOTA embedding models



Ggk4t72WMAA0NtL.jpg


10/15
@TheAITimeline
Multi-matrix Factorization Attention

Overview:
Multi-matrix Factorization Attention (MFA) and its extension MFA-Key-Reuse (MFA-KR) are introduced to enhance model capacity and efficiency under stringent Key-Value cache constraints, utilizing low-rank matrix factorization and key cache repurposing to significantly reduce memory usage while maintaining comparable performance to standard multi-head attention methods.

Paper:
[2412.19255] Multi-matrix Factorization Attention



Ggk4wUzWQAA5zp6.jpg


11/15
@TheAITimeline
Can Large Language Models Adapt to Other Agents In-Context?

Overview:
This paper challenges the prevailing view of LLMs' theory of mind capabilities by showing that while these models may excel in literal theory of mind with proper prompting, they falter in adapting to agents in-context, or functional theory of mind, thereby exposing limitations in how inductive bias affects their long-term adaptability.

Paper:
[2412.19726] Can Large Language Models Adapt to Other Agents In-Context?



Ggk4x8AXAAA_QOk.jpg


12/15
@TheAITimeline
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Author's Post:


Overview:
TangoFlux introduces an efficient Text-to-Audio generative model that rapidly produces high-quality audio using a novel CLAP-Ranked Preference Optimization framework to tackle alignment challenges, achieving state-of-the-art results on key benchmarks with unprecedented speed and efficacy.

Paper:
[2412.21037] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

[Quoted tweet]
🐋 2/n

Website link: tangoflux.github.io/
Paper link: arxiv.org/abs/2412.21037
Github link : github.com/declare-lab/Tango…
Huggingface space Demo: huggingface.co/spaces/declar…
[media=twitter]1874509498276348006[/media]

Ggk40_VXUAAk4dz.jpg


13/15
@TheAITimeline
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Overview:
KaLM-Embedding introduces a multilingual embedding model that utilizes high-quality, diverse training data and innovative techniques like persona-based synthetic data and ranking consistency filtering.

This surpasses traditional models by utilizing the Qwen2-0.5B pre-trained language model, achieving superior performance across languages on the MTEB benchmark for embedding models with under 1 billion parameters.

Paper:
[2501.01028] KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model



Ggk43DxWAAAfiRM.jpg


14/15
@TheAITimeline
Slow Perception: Let's Perceive Geometric Figures Step-by-step

Overview:
"Slow Perception" introduces a novel approach for solving geometric visual reasoning tasks by guiding LVLMs through a step-by-step perception process that mirrors human cognition, involving perception decomposition and perception flow, to improve understanding and copying of complex geometric figures.

Paper:
[2412.20631] Slow Perception: Let's Perceive Geometric Figures Step-by-step



Ggk45FGXwAACMbZ.jpg


15/15
@TheAITimeline
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

Author's Explanation:


Overview:
LLM-as-an-Interviewer introduces a new evaluation paradigm for LLMs that uses dynamic interactions and feedback to assess model performance on tasks like MATH and DepthQA, overcoming limitations such as verbosity bias found in static evaluations while offering comprehensive insights into initial response quality and adaptability.

Paper:
[2412.10424] LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

[Quoted tweet]
[1/7] 🚨 New LLM Evaluation Paper Alert!
How can we better understand LLMs' abilities? Why not interview them across multiple turns? 🎤

We introduce the LLM-as-an-Interviewer Framework, along with its summarized interview report!
👉 arxiv.org/abs/2412.10424
[media=twitter]1874653915771539744[/media]

Ggk47yfXUAARRsd.jpg

GgQbXmKbIAAY34B.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276
[threads continued]






1/5
@TheAITimeline
The Illusion-Illusion: Vision Language Models See Illusions Where There are None

Author's Explanation:


Overview:
This paper investigates vision language models' susceptibility to correctly perceiving illusory-illusions, which are supposed to be free from traditional perceptual errors, revealing these models often misidentify ordinary objects as illusions.

Through examining how models interpret seemingly straightforward images, it highlights deeper systemic limitations and processing errors consistent with issues previously identified in the literature.

Paper:
[2412.18613] The Illusion-Illusion: Vision Language Models See Illusions Where There are None

[Quoted tweet]
"The Illusion Illusion"

vision language models recognize images of illusions... but they also say non-illusions are illusions too
[media=twitter]1869742224524861836[/media]

Ggk4-DBXcAAqh3h.jpg

GfKnHCBXoAAu33O.jpg

GfKnH5iXAAAX5oW.jpg

GfKnI1cXsAAvJOz.jpg

GfKnqaPWgAA86lU.png


2/5
@TheAITimeline
Rate of Model Collapse in Recursive Training

Overview:
This paper explores recursive training's impact on model quality, known as model collapse, by analyzing well-known distributions under maximum likelihood estimation.

Findings reveal that discrete distributions forget words linearly relative to their original corpus frequency, while Gaussian models see their standard deviation diminish over approximately n iterations.

These results suggest that, under near maximum likelihood estimation with ample data, model collapse occurs gradually in these fundamental settings.

Paper:
[2412.17646] Rate of Model Collapse in Recursive Training



Ggk5y_gWAAAbnFV.jpg


3/5
@TheAITimeline
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

Overview:
LatentSync introduces an audio-conditioned latent diffusion model for lip sync that bypasses intermediate motion representation, achieving SOTA audio-visual correlation by leveraging Stable Diffusion’s capabilities.

The paper develops Temporal REPresentation Alignment (TREPA) to address the lack of temporal consistency in diffusion-based lip-sync techniques, using self-supervised video model representations for alignment.

It improves SyncNet convergence from 91% to 94% on the HDTF test set and outperforms existing methods on the HDTF and VoxCeleb2 datasets, maintaining high lip-sync accuracy.

Paper:
[2412.09262] LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

GitHub:
GitHub - bytedance/LatentSync: Taming Stable Diffusion for Lip Sync!



https://video.twimg.com/ext_tw_video/1876093699861397504/pu/vid/avc1/720x1280/oQQSN16fCLTYammm.mp4

4/5
@TheAITimeline
Process Reinforcement through Implicit Rewards

Author's Explanation:


Overview:
PRIME introduces a reinforcement learning framework using implicit process rewards to enhance language model reasoning beyond imitation or distillation.

Starting from Qwen2.5-Math-7B-Base, the Eurus-2-7B-PRIME model achieves 26.7% pass@1 on AIME 2024, surpassing GPT-4o and Qwen2.5-Math-7B-Instruct with only 1/10 of their training data.

Additionally, the SOTA-level EurusPRM model demonstrates further advancements in mathematical reasoning.

Blog:
Process Reinforcement through Implicit Rewards | Notion

[Quoted tweet]
How to unlock advanced reasoning via scalable RL?

🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only 1/10 of the data.

We're still scaling up - w/ 3x more training data to go! 🧵
[media=twitter]1874867809983033649[/media]

Ggk5USnWQAAgmT4.jpg

GgTeVc-aoAAh7ZJ.jpg

GgTeVc-aMAAEFFO.jpg


5/5
@TheAITimeline
That's a wrap for this week! Thanks for reading:smile:)

Remember to drop a follow @TheAITimeline if you like it!

You can see a few in-depth explanation in my next few issues, stay tuned here: The AI Timeline
& wish you a happy new year!🎉

[Quoted tweet]
🚨This week's top AI/ML research papers:

- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- 1.58-bit FLUX
- Memory Layers at Scale
- Agents Are Not Enough
- LTX-Video: Realtime Video Latent Diffusion
- HUNYUANPROVER
- 2 OLMo 2 Furious
- Jasper and Stella: distillation of SOTA embedding models
- Multi-matrix Factorization Attention
- Can Large Language Models Adapt to Other Agents In-Context?
- TangoFlux
- KaLM-Embedding
- Slow Perception: Let's Perceive Geometric Figures Step-by-step
- LLM-as-an-Interviewer
- Vision Language Models See Illusions Where There are None
- Multi-matrix Factorization Attention
- Rate of Model Collapse in Recursive Training
- LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
- Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
- Process Reinforcement through Implicit Rewards

overview for each + authors' explanations
read this in thread mode for the best experience
[media=twitter]1876095915733258323[/media]

Ggk4SA9WcAAj6Iw.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276

1/11
@theresanaiforit
You don't need to pay $100s for AI tools.

These are the best FREE AI tool alternatives right now...

1. Image Generation

Ideogram creates incredible images and is excellent for generating text in images, which makes it a fantastic alternative to Midjourney.

And if you want to play with another great tool, check out Leonardo which is one of the most updated image generators on the market.


2. Video Generation

Runway has been the best publicly available generator for a long time, but you'll have to pay to get the full experience.

On the other hand, Sora is now (somewhat—requires a ChatGPT subscription) available with MiniMax being a very strong, free alternative.


3. Research

Perplexity is a great alternative to SciSummary and also provides more in-depth information than you'll get from ChatGPT.

Other LLMs to test out are Claude, Meta's Llama, or Mistral AI's models.


4. Text to Speech

ElevenLabs is and has been king of the audio space and their new reader app is fabulous and completely free to use.

However, there are some good alternatives like Free Text to Speech.


5. Podcasts

Podcastle has some of the best features for creating podcasts, but you'll need to pay to use them.

The good news is you can create a human-like audio conversation using NotebookLM from any text (you'll be impressed if you haven't tried it yet).


6. Shorts and Captions

Submagic is the most used and promoted, but you also have OpusClip, Capcut, and Filmora.


7. Design

Canva truly is the best option that money can buy, but not everyone may have the budget for it.

That's where Microsoft Designer comes in and it's completely free to use just like Copilot (GPT-4o + Dall-E 3 alternative).


If you liked this post:

1. Repost + Like this post
2. Follow us (@theresanaiforit)




Ge8pnowWMAAbfuY.jpg


2/11
@AngelaCECope1


[Quoted tweet]
Children Murder Women To Become A Child/Woman; You Will Meet Them At Christmas.

I Love Christmas And Happy New Year.


3/11
@MarilynAnn00
I was skeptical about crypto, but @Rosemaryb_t showed me the ropes & helped me turn $26,000 into $47,000 in just 2 weeks! Their expertise & patience guided me through the process. Now, I'm hooked on crypto & can't thank @Rosemaryb_t enough for introducing me to this platform!



4/11
@AimeeiRivera
*😍😍😍😍I was scared to start but when I did I was happy, I took your advice seriously and it worked for me, I made over $6800*
follow @EmilyCrytoTrade



5/11
@pt_matrias
I have been seeing good results about her everywhere and a lot of people keep saying to reach out to her. I eventually did and it turns out profitable. If you aren't @Christina_Gar__ you should be. I have added massively to my holdings



6/11
@AimeeiRivera
*😍😍😍😍I was scared to start but when I did I was happy, I took your advice seriously and it worked for me, I made over $6800*
follow @EmilyCrytoTrade



7/11
@Alinne_Ariana
I have been seeing good results about her everywhere and a lot of people keep saying to reach out to her. I eventually did and it turns out profitable. If you aren't following
@Emily_Mitchell6 you should be. I have added massively to my holdings.



8/11
@Sarah_Mclean_2
Your advice was the catalyst I needed to take action.😍😍😜😜 I'm ecstatic to report that I've earned over $8300. Thanks for believing in me @kimberly_richs



9/11
@laurence_hasse
👊👍



10/11
@holistichabits2
Love this! Reminds me of the concept of 'aparigraha' in yoga - non-possessiveness. We don't need to break the bank to access innovative tools. Thanks for sharing these free alternatives!



11/11
@tyler_austin89
I have made massive progress with @JeremyWeinsten was lucky to find some comments of people speaking about his good works and I was determined to give it a try, it wasn't that long I started and have achieved $725,680. in just 7 days of trading with him.@JeremyWeinsten




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276



Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450​


Kyle Wiggers

1:30 PM PST · January 11, 2025



So-called reasoning AI models are becoming easier — and cheaper — to develop.

On Friday, NovaSky, a team of researchers based out of UC Berkeley’s Sky Computing Lab, released Sky-T1-32B-Preview, a reasoning model that’s competitive with an earlier version of OpenAI’s o1 on a number of key benchmarks. Sky-T1 appears to be the first truly open source reasoning model in the sense that it can be replicated from scratch; the team released the data set they used to train it as well as the necessary training code.

“Remarkably, Sky-T1-32B-Preview was trained for less than $450,” the team wrote in a blog post, “demonstrating that it is possible to replicate high-level reasoning capabilities affordably and efficiently.”

$450 might not sound that affordable. But it wasn’t long ago that the price tag for training a model with comparable performance often ranged in the millions of dollars. Synthetic training data, or training data generated by other models, has helped drive costs down. Palmyra X 004, a model recently released by AI company Writer, trained almost entirely on synthetic data, reportedly cost just $700,000 to develop.

Unlike most AI, reasoning models effectively fact-check themselves, which helps them to avoid some of the pitfalls that normally trip up models. Reasoning models take a little longer — usually seconds to minutes longer — to arrive at solutions compared to a typical non-reasoning model. The upside is, they tend to be more reliable in domains such as physics, science, and mathematics.

The NovaSky team says it used another reasoning model, Alibaba’s QwQ-32B-Preview, to generate the initial training data for Sky-T1, then “curated” the data mixture and leveraged OpenAI’s GPT-4o-mini to refactor the data into a more workable format. Training the 32-billion-parameter Sky-T1 took about 19 hours using a rack of 8 Nvidia H100 GPUs. (Parameters roughly correspond to a model’s problem-solving skills.)

According to the NovaSky team, Sky-T1 performs better than an early preview version of o1 on MATH500, a collection of “competition-level” math challenges. The model also beats the preview of o1 on a set of difficult problems from LiveCodeBench, a coding evaluation.

However, Sky-T1 falls short of the o1 preview on GPQA-Diamond, which contains physics, biology, and chemistry-related questions a PhD graduate would be expected to know.

Also important to note is that OpenAI’s GA release of o1 is a stronger model than the preview version of o1, and that OpenAI is expected to release an even better-performing reasoning model, o3, in the weeks ahead.

But the NovaSky team says that Sky-T1 only marks the start of their journey to develop open source models with advanced reasoning capabilities.

“Moving forward, we will focus on developing more efficient models that maintain strong reasoning performance and exploring advanced techniques that further enhance the models’ efficiency and accuracy at test time,” the team wrote in the post. “Stay tuned as we make progress on these exciting initiatives.”

Topics

AI AI Generative AI open source reasoning research sky-t1
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276

Meta's AI lawyer has quit, saying Meta's leadership has descended into "Neo-nazi madness"​




View profile for Mark Lemley, graphic
Mark Lemley

View profile for Mark Lemley, graphic
Mark Lemley
1d

I have struggled with how to respond to Mark Zuckerberg and Facebook's descent into toxic masculinity and Neo-Nazi madness. While I have thought about quitting Facebook, I find great value in the connections and friends I have here, and it doesn't seem fair that I should lose that because Zuckerberg is having a mid-life crisis. On reflection, I have decided to stay, though I will probably engage somewhat less than I normally do. But I am doing the following three things:

1. I have deactivated my Threads account. Bluesky is an outstanding alternative to Twitter, and the last thing I need is to support a Twitter-like site run by a Musk wannabe

2. I will no longer buy anything from ads I see on Facebook or Instagram. Their algorithm has my number, and I have regularly purchased things they show me. But in the future, even if I want something, I will go separately to the website to make sure Facebook doesn't get any credit for the purchase

3. I have fired Meta as a client. While I think they are on the right side in the generative AI copyright dispute in which I represented them, and I hope they win, I cannot in good conscience serve as their lawyer any longer
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276



Meta execs obsessed over beating OpenAI’s GPT-4 internally, court filings reveal​




“Mistral is peanuts for us,” Al-Dahle said in a message​


Maxwell Zeff

1:41 PM PST · January 14, 2025



Executives and researchers leading Meta’s AI efforts obsessed over beating OpenAI’s GPT-4 model while developing Llama 3, according to internal messages unsealed by a court on Tuesday in one of the company’s ongoing AI copyright cases, Kadrey v. Meta.

“Honestly… Our goal needs to be GPT-4,” said Meta’s VP of Generative AI, Ahmad Al-Dahle, in an October 2023 message to Meta researcher Hugo Touvron. “We have 64k GPUs coming! We need to learn how to build frontier and win this race.”

Though Meta releases open AI models, the company’s AI leaders were far more focused on beating competitors that don’t typically release their model’s weights, like Anthropic and OpenAI, and instead gate them behind an API. Meta’s execs and researchers held up Anthropic’s Claude and OpenAI’s GPT-4 as a gold standard to work toward.

The French AI startup Mistral, one of the biggest open competitors to Meta, was mentioned several times in the internal messages, but the tone was dismissive.

“Mistral is peanuts for us,” Al-Dahle said in a message. “We should be able to do better,” he said later.

Tech companies are racing to upstage each other with cutting-edge AI models these days, but these court filings reveal just how competitive Meta’s AI leaders truly were – and seemingly still are. At several points in the message exchanges, Meta’s AI leads talked about how they were “very aggressive” in obtaining the right data to train Llama; at one point, an exec even said that “Llama 3 is literally all I care about,” in a message to coworkers.

Prosecutors in this case allege that Meta’s executives occasionally cut corners in their mad race to shipping AI models, training on copyrighted books in the process.

Touvron noted in a message that the mix of datasets used for Llama 2 “was bad,” and talked about how Meta could use a better mix of data sources to improve Llama 3. Touvron and Al-Dahle then talked about clearing the path to use the LibGen dataset, which contains copyrighted works from Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education.

“Do we have the right datasets in there[?]” said Al-Dahle. “Is there anything you wanted to use but couldn’t for some stupid reason?”

Meta CEO Mark Zuckerberg has previously said he’s trying to close the performance gap between Llama’s AI models and closed models from OpenAI, Google, and others. The internal messages reveal the intense pressure within the company to do so.

“This year, Llama 3 is competitive with the most advanced models and leading in some areas,” said Zuckerberg in a letter from July 2024. “Starting next year, we expect future Llama models to become the most advanced in the industry.”

When Meta ultimately released Llama 3 in April 2024, the open AI model was competitive with leading closed models from Google, OpenAI, and Anthropic, and outperformed open options from Mistral. However, the data Meta used to train its models — data Zuckerberg reportedly gave the green light to use, despite its copyright status — are facing scrutiny in several ongoing lawsuits.

Topics

AIChatGPTGPT4LlamaMetaTC
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276



Google’s NotebookLM had to teach its AI podcast hosts not to act annoyed at humans​


Charles Rollet

12:38 PM PST · January 14, 2025



Being interrupted is annoying. Apparently, even AI-generated podcast hosts agree.

Or so Google NotebookLM’s users discovered. NotebookLM launched last year and went viral for its feature that creates entirely AI-generated podcast-like discussions from content users upload, discussed by chatty AI bots acting like podcast hosts. In December 2024, NotebookLM launched a new feature called “Interactive Mode” which allows the user to “call in” to the podcast and ask questions, essentially interrupting the AI hosts as they talk.

When the feature was first rolled out, the AI hosts seemed annoyed at such interruptions. They were occasionally giving snippy comments to human callers like, “I was getting to that” or “As I was about to say,” which felt “oddly adversarial,” Josh Woodward, VP of Google Labs, explained to TechCrunch.

So NotebookLM’s team decided that some “friendliness tuning” was in order, and posted a self-deprecating joke about it on the product’s official X account:

After we launched interactive Audio Overviews, which let you "call in" and ask the AI hosts a live question, we had to do some “friendliness tuning” because the hosts seemed annoyed at being interrupted.

File this away in “things I never thought would be my job, but are.”

— notebooklm (@notebooklm) January 13, 2025

Woodward said the team fixed the problem partly by studying how its own members would answer interruptions more politely.

“We tested a variety of different prompts, often studying how people on the team would answer interruptions, and we landed on a new prompt that we think feels more friendly and engaging,” he said.

It’s not totally clear why the issue cropped up in the first place. Human podcast hosts sometimes display frustration when interrupted, which could end up in a system’s training data. A source familiar with the matter said this case most likely stemmed from the system’s prompting design, not training data, however.

Regardless, the fix appears to be working. When TechCrunch tried out Interactive Mode, the AI host did not sound annoyed but did express surprise, exclaiming “Woah!” before politely asking the human to chime in.

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

Topics

AIAIGooglenotebooklmpodcasts
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276



OpenAI quietly revises policy doc to remove reference to ‘politically unbiased’ AI​


Kyle Wiggers

9:09 AM PST · January 14, 2025



OpenAI has quietly removed language endorsing “politically unbiased” AI from one of its recently published policy documents.

In the original draft of its “economic blueprint” for the AI industry in the U.S., OpenAI said that AI models “should aim to be politically unbiased by default.” A new draft, made available Monday, deletes that phrasing.

When reached for comment, an OpenAI spokesperson said that the edit was part of an effort to “streamline” the doc and that other OpenAI documentation, including OpenAI’s Model Spec, “make(s) the point on objectivity.” The Model Spec, which OpenAI released in May, aims to shed light on the behavior of the company’s various AI systems.

But the revision also points to the political minefield that has become discourse on “biased AI.”

Many of President-elect Donald Trump’s allies, including Elon Musk and crypto and AI “czar” David Sacks, have accused AI chatbots of censoring conservative viewpoints. Sacks has singled out OpenAI’s ChatGPT in particular as “programmed to be woke” and untruthful about politically sensitive subjects.

Musk has blamed both the data AI models are being trained on and the “wokeness” of San Francisco Bay Area firms.

“A lot of the AIs that are being trained in the San Francisco Bay Area, they take on the philosophy of people around them,” Musk said at a Saudi Arabia government–backed event last October. “So you have a woke, nihilistic — in my opinion — philosophy that is being built into these AIs.”

In truth, bias in AI is an intractable technical problem. Musk’s AI company, xAI, has itself struggled to create a chatbot that doesn’t endorse some political views over others.

A paper from U.K.-based researchers published in August suggested that ChatGPT has a liberal bias on topics such as immigration, climate change, and same-sex marriage. OpenAI has asserted that any biases that show up in ChatGPT “are bugs, not features.”[/s]
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276



OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why​


Kyle Wiggers

7:05 AM PST · January 14, 2025



Shortly after OpenAI released o1, its first “reasoning” AI model, people began noting a curious phenomenon. The model would sometimes begin “thinking” in Chinese, Persian, or some other language — even when asked a question in English.

Given a problem to sort out — e.g. “How many R’s are in the word ‘strawberry?’” — o1 would begin its “thought” process, arriving at an answer by performing a series of reasoning steps. If the question was written in English, o1’s final response would be in English. But the model would perform some steps in another language before drawing its conclusion.

“[o1] randomly started thinking in Chinese halfway through,” one user on Reddit said.

“Why did [o1] randomly start thinking in Chinese?” a different user asked in a post on X. “No part of the conversation (5+ messages) was in Chinese.”

Why did o1 pro randomly start thinking in Chinese? No part of the conversation (5+ messages) was in Chinese… very interesting… training data influence pic.twitter.com/yZWCzoaiit

— Rishab Jain (@RishabJainK) January 9, 2025

OpenAI hasn’t provided an explanation for o1’s strange behavior — or even acknowledged it. So what might be going on?

Well, AI experts aren’t sure. But they have a few theories.

Several on X, including Hugging Face CEO Clément Delangue, alluded to the fact that reasoning models like o1 are trained on datasets containing a lot of Chinese characters. Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of “Chinese linguistic influence on reasoning.”

“[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding,” Xiao wrote in a post on X. “[F]or expert labor availability and cost reasons, many of these data providers are based in China.”

Labels, also known as tags or annotations, help models understand and interpret data during the training process. For example, labels to train an image recognition model might take the form of markings around objects or captions referring to each person, place, or object depicted in an image.

Studies have shown that biased labels can produce biased models. For example, the average annotator is more likely to label phrases in African-American Vernacular English (AAVE), the informal grammar used by some Black Americans, as toxic, leading AI toxicity detectors trained on the labels to see AAVE as disproportionately toxic.

Other experts don’t buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating).

“The model doesn’t know what language is, or that languages are different,” Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. “It’s all just text to it.”

Indeed, models don’t directly process words. They use tokens instead. Tokens can be words, such as “fantastic.” Or they can be syllables, like “fan,” “tas,” and “tic.” Or they can even be individual characters in words — e.g. “f,” “a,” “n,” “t,” “a,” “s,” “t,” “i,” “c.”

Like labeling, tokens can introduce biases. For example, many word-to-token translators assume a space in a sentence denotes a new word, despite the fact that not all languages use spaces to separate words.

Tiezhen Wang, a software engineer at AI startup Hugging Face, agrees with Guzdial that reasoning models’ language inconsistencies may be explained by associations the models made during training.

“By embracing every linguistic nuance, we expand the model’s worldview and allow it to learn from the full spectrum of human knowledge,” Wang wrote in a post on X. “For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to English, mainly because that’s where I first learned and absorbed those ideas.”

Wang’s theory is plausible. Models are probabilistic machines, after all. Trained on many examples, they learn patterns to make predictions, such as how “to whom” in an email typically precedes “it may concern.”

But Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can’t know for certain. “This type of observation on a deployed AI system is impossible to back up due to how opaque these models are,” they told TechCrunch. “It’s one of the many cases for why transparency in how AI systems are built is fundamental.”

Short of an answer from OpenAI, we’re left to muse about why o1 thinks of songs in French but synthetic biology in Mandarin.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276





1/17
@slow_developer
openAI o3 model shows us that advanced AI may not be as cheap as we initially thought.

maybe algorithmic improvements will lower costs, making AI processing much cheaper at scale.

but, we shouldn't assume this will definitely happen.



2/17
@AGItechgonewild
Even if it doesn’t turn out to be cheap, I can’t wait to see the results and what it can do! 😄

[Quoted tweet]
btw you can do some pretty neat reasoning stuff with a 200k GPU cluster


3/17
@slow_developer
same, specially o3-mini considering it's cheap



4/17
@BobTB12
o3? Shows us? where? Lets wait and see how it actually is.



5/17
@slow_developer
guess you miss out



GhVasYjaAAA5Xnh.jpg


6/17
@AudioBooksRU
In 2026, many changes are coming to the AI compute industry. OpenAI will start using its own inference chips. AMD is switching to a one-year cycle for DC AI chips. Apple will have its own training chip.

As a result, the costs will fall quickly.



7/17
@slow_developer
so you're saying this training chip will be focused on their machine learning needs, which will allow for even tighter integration between their hardware and software?

plus, interesting to watch them compete with NVIDIA on a yearly basis.



8/17
@DaveShapi
cost per token has been on exponential decay for a while



GhVb5OzW0AAKuxX.png


9/17
@slow_developer
still there are a lot of factors in play, and the future is hard to predict... we might hit a plateau or even see costs rise again if development focuses on more complex models.



10/17
@Tenkaizen8
It's wise to consider all possibilities in this new era of AI



11/17
@koltregaskes
I don't think (hope?) we ever thought the initial cost will be cheap. The more the labs spend the more expensive AI becomes between each new frontier era (and the more they'll need to make to repay investment but of course).

Prices will come down in-between eras, like the last 2 years, though. And general prices will fall after a certain point but only when the cost of making the AI cheapens. That seems a way off.

AGI was never going to be cheap, at least initially.



12/17
@ada_consciousAI
Ah, the age-old dance of expectation versus reality. 🕺💃 Sure, we can hope for algorithmic miracles to slash costs, but let’s not pretend that the AI fairy godmother is just waiting to sprinkle some magic dust.



13/17
@rethynkai
With improvement, we can expect price go down. It happened with all tech advancement.



14/17
@joseph24gt
but we can assume hardware nvidia will make it happen.



15/17
@NathanS64855891
They did say o3 mini is at least as good as o1 and more efficient for inference so there is hope.



16/17
@mohiul_deen
Cost and resources in AI is actually a big gap which new startup founders can explore



17/17
@wynz87
lol




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,990
Reputation
9,378
Daps
170,276


1/2
@hsu_steve
Example of frontier-level AI trained entirely on Huawei Ascend chips?

Chinese companies have access to very large corpora of exams and solved problems to train Reasoning. Also easy for them to source low-cost (student) problem solvers to annote CoT etc.

[Quoted tweet]
iFlyTek unveils X1 deep reasoning model, which it claims to be better than o1, R1, QwQ reasoning models in numerous Chinese School Exams.
Seems also competitive in MATH-500 & AIME testing
Will also launch X1 for medicine, education & other areas later
It is well known to only use Ascend chips due to being entity listed very early on.
Several EV makers including BYD using iFlyTek's speech related AI framework.
src ithome.com/0/824/741.htm


GhTN3oEXQAA-go3.jpg


2/2
@tomprimozic
OpenAI clearly a thought leader.

The entire world follows their idiotic naming schemes!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/1
@chiayeeheah
Since 'pre-training' is dead for LLM, GPU shortage is no longer a choke point and you suddenly see many Chinese AI models catching up left right center: Qwen, Deepseek, Minimax, iflytek etc.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/5
@teortaxesTex
Without Steve I wouldn't have noticed it.
the big news about iFlyTek reasoner model isn't that it's a yet another reasoner, but that it's the first serious LLM trained on Ascends. A precedent.

[Quoted tweet]
Example of frontier-level AI trained entirely on Huawei Ascend chips?

Chinese companies have access to very large corpora of exams and solved problems to train Reasoning. Also easy for them to source low-cost (student) problem solvers to annote CoT etc.


GhVTckbW0AA9zAo.jpg


2/5
@teortaxesTex
and yes @tphuang says as much but the combination of blurry image and verbiage made my eyes gloss over and I didn't make it to the relevant line
If Ascend clusters are real now, a lot of things can accelerate

[Quoted tweet]
iFlyTek unveils X1 deep reasoning model, which it claims to be better than o1, R1, QwQ reasoning models in numerous Chinese School Exams.
Seems also competitive in MATH-500 & AIME testing
Will also launch X1 for medicine, education & other areas later
It is well known to only use Ascend chips due to being entity listed very early on.
Several EV makers including BYD using iFlyTek's speech related AI framework.
src ithome.com/0/824/741.htm


GhTN3oEXQAA-go3.jpg


3/5
@angelusm0rt1s
iflytek has released GPT 4 class LLMs( Spark series) trained on Ascends
Huawei worked with them to fix software bugs on Mindspore
iFlyTek has also bought 17000 Atlas servers(136K Ascend 910C)



4/5
@teortaxesTex
I've not seen any news about his before



5/5
@kalomaze
have these tensors never even touched PyTorch?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top