bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833



1/14
@reach_vb
MIT licensed Phi 4 running 100% locally on @lmstudio with GGUF straight from the Hugging Face Hub! 🔥

Not your weights not your brain!

lms get phi-4 is all you need! 🐐

[Quoted tweet]
Chat, MIT licensed Phi 4 is here, how are the vibes? 🔥
huggingface.co/microsoft/phi…


https://video.twimg.com/ext_tw_video/1877078224720572416/pu/vid/avc1/1112x720/yVQDirpt3bu-Z5xX.mp4

2/14
@reach_vb
Model weights:

lmstudio-community/phi-4-GGUF · Hugging Face



3/14
@alamshafil
How does it compare to llama3.2?

(I’m new to this stuff, so not sure how accurately it can be compared)



4/14
@reach_vb
On paper it even beats Qwen 2.5 (which is much better) - need to vibe check it more.



5/14
@carsenklock
Phi-4 is 😍



6/14
@heyitsyorkie
LM Studio is the cleanest UI 🔥



7/14
@ivanfioravanti
So far so good(mega) good for me! This model rocks!



8/14
@gg1994bharat
For my office we are using for text context analysis and removing un wanted text we are using llama 3.3 70b model .. will this will help ?



9/14
@lmstudio
It might: sounds like a perfect opportunity to test this model and see if you see good results



10/14
@dhruv2038
ahh great!



11/14
@muktabh
VRAM requirement ?



12/14
@lifeafterAi_
Qwen 3 Will be 🤯

[Quoted tweet]
Qwen 3 14b will be insane 🤯 even qwen3 7b 🔥


GgyKhQgXAAAQQnq.jpg

GgyKhQgWkAAKb1J.jpg


13/14
@LIama002
What do u use to graphically display powermetrics? 📊



14/14
@AI_Fun_times
Exciting to see Phi 4 in action with LMStudio! Leveraging the power of Hugging Face Hub and MIT licensing is a wise move. Have you dived into fine-tuning with GGUF yet?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/14
@reach_vb
Chat, MIT licensed Phi 4 is here, how are the vibes? 🔥
microsoft/phi-4 · Hugging Face



2/14
@yabooki
Thanks, but how to use it with Ollama ?



3/14
@reach_vb
ollama run hf. co/reach-vb/phi-4-Q4_K_M-GGUF



4/14
@lalopenguin
lets....... GOO!!!!!



5/14
@a_void_sky
there must be a reason that @OpenAI leads the Math benchmark every time



6/14
@mkurman88
Great news it finally comes to the public!



7/14
@AILeaksAndNews
MIT license is big



8/14
@ArpinGarre66002
Mid



9/14
@CEOofFuggy
@ollama



GgyF5XLXoAESJvh.jpg


10/14
@ollama
ollama run phi4

let's go! 💪



11/14
@MillenniumTwain
Public Sector 'AI' is already more than Two Decades behind Private/Covert sector << AGI >>, and all our Big Tech Fraudsters are doing is accelerating the Dumb-Down of our Victim, Slave, Consumer US Public, and World!

[Quoted tweet]
"Still be Hidden behind Closed Doors"? Thanks to these Covert Actors (Microsoft, OpenAI, the NSA, ad Infinitum) — More and More is Being Hidden behind Closed Doors every day! The ONLY 'forward' motion being their exponentially-accelerated Big Tech/Wall Street HYPE, Fraud, DisInfo ...


Gb-CZx0XAAA7Jyb.jpg


12/14
@allan_d_clive
finally.....



13/14
@agichronicles
No function calling



14/14
@bertaunth
currently waiting for the LiveBench benchmarks to drop




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/11
@rasbt
The model weights of Microsoft’s phi-4 14B LLM are now finally publicly available. Thanks to a kind community contribution, it’s now also available in LitGPT already (litgpt finetune microsoft/phi-4): phi-4 by ysjprojects · Pull Request #1904 · Lightning-AI/litgpt

I will have more to say about phi-4 in my upcoming AI Research Review 2024 Part 2 article. The paper ([2412.08905] Phi-4 Technical Report) has a lot of interesting insights into synthetic data for pretraining. E.g., what’s interesting about phi-4 is that the training data consisted of 40% synthetic data. And the researchers observed that while synthetic data is generally beneficial, models trained exclusively on synthetic data performed poorly on knowledge-based benchmarks. To me, this raises the question: does synthetic data lack sufficient knowledge-specific information, or does it include a higher proportion of factual errors, such as those caused by hallucinations?

At the same time, the researchers found that increasing the number of training epochs on synthetic data boosted the performance more than just adding more web data, as shown in the figure below.

In any case, I will expand on this discussion soon in my “Noteworthy AI Research Papers of 2024 (Part Two)” article. Stay tuned!



Gg3Ld2nWUAAJDq9.jpg


2/11
@billynewport
Is it true to say that the benefit of synthetic data is in COT style training materiel to improve reasoning or test time compute rather than learning knowledge per se? It seems so far most LLMS are rote learning facts/knowledge through data but this makes reasoning hard because thats now what they trained on.



3/11
@rasbt
Yes. I think the main benefit is that it comes in a more structured or refined format compared to raw data. But the knowledge is the same as in the raw data (and may even be more hallucinated), considering that raw data was used to generate the synthetic data through a model.



4/11
@Dayder111
Maybe synthetic data hallucinates away facts that are supposed to be precise, but also it helps to generalize better and understand better connections between things?
Like, you can be generally smart, but not dedicate your neurons to remembering specific facts that much, only



5/11
@rasbt
Yeah, I think the main advantage is from the more refined nature of the synthetic data when it comes to response structure etc. Synthetic data can't really contain knowledge that raw data doesn't already include because the raw data was used to come up with the synthetic data in the first place.



6/11
@arnitly
What would you say are the best practices to ensure while creating synthetic data. How do you ensure model does not hallucinate a lot aside from setting the temperature setting to zero?



7/11
@rasbt
Since the model can't really explicitly distinguish between synthetic and non-synthetic data during training, the best way would tackle the problem at the root: ensuring that the synthetic data-generating model does not produce hallucinated contents.



8/11
@Yuchenj_UW
Interesting, getting more synthetic data seems to be the way



9/11
@rasbt
Yeah, I see it as some flavor of transfer learning (i.e., not starting from raw data). Synthetic data generated by a high-quality model (such as GPT-4o, which has already undergone extensive refinement) may serve as a kind of jumpstart to the model you are trying to train.



10/11
@yisustalone
Cool, looking forward for your analysis



11/11
@elbouzzz
Holy shyt i'm just here to say he's back!! Hallelujah!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/15
@EHuanglu
Microsoft has released Phi-4

a 14 billion parameter language model, under the MIT license on Hugging Face.

fully open sourced



GgzVSEPaMAMM1z7.jpg


2/15
@EHuanglu
Huggingface link:

microsoft/phi-4 · Hugging Face



3/15
@artificialbudhi
Huge!



4/15
@twizshaq
W



5/15
@RuneX_ai
Is it available on Azure? Can you compare is with llama? On premise solution?



6/15
@rethynkai
14b is an ideal number parameter.



7/15
@oscarle_x
Anyone tested yet if Phi-4 is anywhere near Qwen 2.5 72B as they claimed? Thanks



8/15
@Gdgtify
I remember the announcement from a while back. I am glad it is finally on HuggingFace

[Quoted tweet]
🚀 Phi-4 is here! A small language model that performs as well as (and often better than) large models on certain types of complex reasoning tasks such as math. Useful for us in @MSFTResearch, and available now for all researcher on the Azure AI Foundry! aka.ms/phi4blog


GepAFSeaIAA5IH8.jpg


9/15
@vagasframe
🤯



10/15
@SentientAtom
Can this be run offline with a NVidoa compute unit?



11/15
@simonkp
It's great to see Phi-4 released under the MIT license; this should really boost open-source AI development. I'm curious to see how it stacks up against models like Qwen. It is definitely good news that it's on Hugging Face.



12/15
@boringsidequest
Their models seems to be poorly trained on languages other than English, so I'm not expecting much



13/15
@Catalina5803_


[Quoted tweet]
darkreading.com/application-…


14/15
@Jayy23P92624




15/15
@0xargumint
Nice to see Microsoft finally letting one out of the cage. Though at 14B params it's more like releasing a kitten than a lion. Still, better than another walled garden.






1/3
@_akhaliq
Phi-4 is now available in anychat

Try it out



GgzCo6ZXMAArd_T.jpg


2/3
@_akhaliq
App: Anychat - a Hugging Face Space by akhaliq



3/3
@rogue_node
Avoid anything microsoft
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833




1/9
@reach_vb
Pretty fukking wild that you can 2-3x the math capabilities just by continually training on high quality 160B tokens*

*without compromising on other metrics 🤯

[Quoted tweet]
fukk yeah, we did continual pre-training of llama3.2-3B for 160B tokens and got 100% improvement on GSM8K and 300% on MATH.


GgoBLyGWAAEGYSJ.jpg


2/9
@reach_vb
Vibe check the model directly here:

HuggingFaceTB/FineMath-Llama-3B · Hugging Face



3/9
@reach_vb
Oh and the dataset is permissively licensed - use it for your own datasets:

HuggingFaceTB/finemath · Datasets at Hugging Face



4/9
@TheXeophon
Looked at the clock and it showed that we are so back



5/9
@reach_vb
😂😂😂



6/9
@__morse
Sonnet secret sauce



7/9
@lifeafterAi_
When Java llama3.2 3b



8/9
@mav3ri3k
Practice makes the man perfect.



9/9
@llm_guruji
That's the power of machine learning! Training on high-quality data can significantly boost math capabilities. Continuous learning on a vast token set is fascinating for improving overall performance




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196












1/32
@eliebakouch
fukk yeah, we did continual pre-training of llama3.2-3B for 160B tokens and got 100% improvement on GSM8K and 300% on MATH.



GgoBLyGWAAEGYSJ.jpg


2/32
@eliebakouch
Our script for continual training with Nanotron is available on the smollm github repo, along with everything to reproduce the training and ablation studies!

- Model: HuggingFaceTB/FineMath-Llama-3B · Hugging Face
- Dataset: HuggingFaceTB/finemath · Datasets at Hugging Face
- reproduce the training/ablation: smollm/pre-training/continual-pretraining at main · huggingface/smollm

Since we are talking math, happy (1+2+3+4+5+6+7+8+9)^2 everyone 🤗



3/32
@6___0
did u train on the eval data?🫡



4/32
@eliebakouch
No, see HuggingFaceTB/finemath_contamination_report · Datasets at Hugging Face



5/32
@mkurman88
I wonder why the hell MMLU always goes down. Any ideas?



6/32
@eliebakouch
When you do continual pre-training and change data mixture it's hard to preserve 100% of previous perf



7/32
@laulerr
Did you manage to get continual pre-training to work with tp=4? I see the yaml has tp=2, and I myself had issues getting that PR branch to work with setting other than tp=2.



8/32
@eliebakouch
Oh interesting, didn't try with tp=4 as tp=2 was enough, is it specific to [NEW] Llama3.2 weight converters 🦙 by TJ-Solergibert · Pull Request #255 · huggingface/nanotron ? maybe cc @Nouamanetazi as well



9/32
@yisustalone
Cool! No finetuning?



10/32
@eliebakouch
No, only continual pre-training!



11/32
@lu_sichu
Stop scaring ilya with pre-training zombies



12/32
@eliebakouch
🍿🍿



13/32
@stochasticchasm
Just simple continued pretraining?



14/32
@eliebakouch
Yes, nothing fancy, just a nice dataset 🪄



15/32
@SearchTrut69841
[2309.08632] Pretraining on the Test Set Is All You Need



16/32
@eliebakouch
You can check the contamination report here HuggingFaceTB/finemath_contamination_report · Datasets at Hugging Face



17/32
@danielmerja
@nisten



18/32
@vincentweisser
🔥



19/32
@staghado
was FineMath de-contaminated against evals like GSM8K? seems like FineMath would have a lot of contaminated samples given how it’s sourced.



20/32
@menhguin
wait, has no one tried that before????



21/32
@irohsharpeniroh
grokking go brrr?



22/32
@kaetemi
What's the ballpark time and expenses to do a continued pre-training at this model size?



23/32
@Yuchenj_UW
Wow!



24/32
@zealandic1
Nice, looking forward to trying finemath



25/32
@ivanfioravanti
wow! Great result!



26/32
@DFinsterwalder
What was the hardware you trained on and how long did it run?



27/32
@erudictus
is that grokking?



28/32
@victor_explore
Scale is all you need



29/32
@abuchanlife
that’s some serious number crunching! love to see those improvements.



30/32
@ddebowczyk
"Sonnet3.5 class" smart, small LLMs > big, slow "PhD level" LRMs

[Quoted tweet]
We need superfast models at much lower cost - they can stay at the current level of Sonnet/GPT-4o reasoning performance.

10x speed and 1/10 cost would have much bigger practical impact than making those new heavy and expensive models (o1, opus, pro/ultra or their successors) "smarter".


31/32
@ThinkDi92468945
At the expense of other capabilities. Ok 🙄



32/32
@llm_guruji
all is well




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833




1/11
@reach_vb
We’re so unfathomably back - @NVIDIAAI releases Cosmos: World foundation models (commercially permissive) 🔥

Models trained on over 20 MILLION hours of video can be used to generate dynamic, high quality videos from text, image, or video inputs 🤯

Available directly on Hugging Face & transformers integration incoming 🤗



https://video.twimg.com/amplify_video/1876513636807581696/vid/avc1/884x476/-Ni8nyyToMIhb0mh.mp4

2/11
@reach_vb
Check out all the diffusion & autoregressive models here:

Cosmos - a nvidia Collection



3/11
@reach_vb
All models are text /+ image/ video in and video out:

[Quoted tweet]
Here’s a summary of all the models released:

> Cosmos Autoregressive 4B & 12B

- Given a 9-frame input video, predicts the future 24 frames
- Given an image as the first frame, predicts the future 32 frames

> Cosmos Autoregressive 5B & 13B Video2World

- Given text description and a 9-frame input video, predicts the future 24 frames
- Given text description and an image as the first frame, predicts the future 32 frames

> Cosmos Diffusion 7B & 14B Text2World

- Given a text description, predict an output video of 121 frames.

> Cosmos Diffusion 7B & 14B Video2World

- Given a text description and an image as the first frame, predict the future 120 frames


4/11
@reach_vb
Forgot to link the GitHub repo earlier:

GitHub - NVIDIA/Cosmos: Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate videos.



5/11
@llm_guruji
all is well



6/11
@_dlangston
that's pretty neat, tech keeps evolving fast



7/11
@AI_Homelab
& run it on nvidia digits. 128gb of unified Ram for 3000USD is not a great deal - but I'll take it. 😃

(At least if it will be faster than an M4Pro)



8/11
@EricBader767348
HF mentions only Blackwell, hopper and pascal are supported. It should work on any nvidia gpu as long as it has enough memory, right?



9/11
@rogerscissp
Model Summary! Epic!!!!



GgtO55VXUAA_FGr.png


10/11
@_ash_ran
Human art is like hand loom and AI art is like power loom. In the products of the hand loom the magic of man's living fingers finds its expression, and its hum harmonizes with the music of life. But the power loom is relentlessly lifeless and accurate and monotonous in production



11/11
@3DTOPO
Surely he wears patent printed croc pajamas too 😂



GgwIXjhXEAAR4pU.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/5
@reach_vb
Here’s a summary of all the models released:

> Cosmos Autoregressive 4B & 12B

- Given a 9-frame input video, predicts the future 24 frames
- Given an image as the first frame, predicts the future 32 frames

> Cosmos Autoregressive 5B & 13B Video2World

- Given text description and a 9-frame input video, predicts the future 24 frames
- Given text description and an image as the first frame, predicts the future 32 frames

> Cosmos Diffusion 7B & 14B Text2World

- Given a text description, predict an output video of 121 frames.

> Cosmos Diffusion 7B & 14B Video2World

- Given a text description and an image as the first frame, predict the future 120 frames

[Quoted tweet]
We’re so unfathomably back - @NVIDIAAI releases Cosmos: World foundation models (commercially permissive) 🔥

Models trained on over 20 MILLION hours of video can be used to generate dynamic, high quality videos from text, image, or video inputs 🤯

Available directly on Hugging Face & transformers integration incoming 🤗


https://video.twimg.com/amplify_video/1876513636807581696/vid/avc1/884x476/-Ni8nyyToMIhb0mh.mp4

2/5
@brandontownes
Can someone make a model for Metal? 😂



3/5
@rayzhang123
These models are impressive, but context is key. Predicting future frames relies heavily on understanding the initial input's nuances. Without that, the predictions might miss the mark, leading to less useful outcomes.



4/5
@_r4victor
So both Autoregressive and Diffusion models can be used for img2video? Any insights which one to use?



5/5
@eth_10uGod
It looks very handsome.



Ggq7t3gaEAAnI3h.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833




1/11
@reach_vb
HOLY shyt - generate 3D mesh from a single image in LESS THAN A SECOND 🤯



https://video.twimg.com/ext_tw_video/1877099677432094720/pu/vid/avc1/1280x720/iv1TWYuZDFSXVOSO.mp4

2/11
@reach_vb
Play with it directly here:

Stable Point-Aware 3D - a Hugging Face Space by stabilityai



3/11
@reach_vb
Model: stabilityai/stable-point-aware-3d · Hugging Face



4/11
@AwokeKnowing
can we input several unposed images eg of each side?



5/11
@reach_vb
try it out here directly:

[Quoted tweet]
Play with it directly here:

huggingface.co/spaces/stabil…


6/11
@Chaos2Cured
It took me weeks on Maya 2.0.

This is unreal. •



7/11
@SarvasvKulpati
@gd3kr



8/11
@tomlikestocode
this is mind-blowing. imagining how this could transform industries like gaming and design. amazing progress.



9/11
@ConveLab
Can we go from image to generated mesh to 3d format for printing



10/11
@calebfahlgren
AI for 3D has been cooking lately



11/11
@AbdullahAdeebi
incredible!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833



1/4
@reach_vb
Wait, that’s a VLM using ~2.5GB VRAM - taking on giants w/ almost 10x usage 🔥

@vikhyatk and team on FIRE - looking for ward to even more optimised CPU + GPU versions!

[Quoted tweet]
New Moondream 2B release is out!

Includes structured outputs, improved text understanding, gaze detection. And probably more things I'm forgetting about right now.


Gg3l-JMWMAAVQ5U.jpg

Gg3kVhfaMAAHdGL.jpg


2/4
@reach_vb
Model checkpoint:

vikhyatk/moondream2 · Hugging Face



3/4
@reach_vb
Play with it directly here:

moondream2 - a Hugging Face Space by vikhyatk



4/4
@stochasticchasm
Lol you should maybe say 10x less usage instead of 10x usage




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833



1/3
@reach_vb
LiminaBrush - convert an image to a "uniformly-lit" appearance, and another stage to generate the illumination effect with user scribbles 🤯



GgZ-ZGEWoAA0mpV.jpg


2/3
@reach_vb
Try it out on the HF space directly:

LuminaBrush - a Hugging Face Space by lllyasviel



3/3
@reach_vb
GitHub:

GitHub - lllyasviel/LuminaBrush: Illumination Drawing Tools for Text-to-Image Diffusion Models




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833


1/3
@reach_vb
There's new Instruct checkpoints as well! 💥

[Quoted tweet]
Yo! @allen_ai just dropped OLMo 2 Tech report, some interesting things I found 🔥

Architecture
> Reordered norm: Normalizing outputs of attention and feedforward layers within transformer blocks instead of inputs
> QK-norm: Normalizing key and query projections with RMSNorm before calculating attention to prevent large attention logits and training loss divergence
> Z-Loss: Adopting z-loss regularization, empirically shown to improve training stability
> RoPE: Increasing RoPE (Rotary Positional Embedding) θ to 500,000 from 10,000 to enhance positional encoding resolution

Dataset + pre-training
> Removal of GitHub repositories with fewer than 2 stars and documents with excessive repetition or binary/numerical content
> Mid-Training w/ Synthetic Data: Introduction of a second training stage using synthetic data to enhance math and STEM capabilities
> Micro-Annealing Technique: independently assess mid-training data sources, reducing experimentation costs
> Post-Training Pipeline: Expansion of the RLVR pipeline for fine-tuning base models into chat variants, focusing on permissive data


GgYkFoAWMAAXUb0.jpg

GgYin-3XIAA_qWd.jpg


2/3
@reach_vb
Checkpoints here:

OLMo 2 - a allenai Collection



3/3
@nvn_osto
I'm all for this post training and optimization era in the open sauce community.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/4
@reach_vb
Yo! @allen_ai just dropped OLMo 2 Tech report, some interesting things I found 🔥

Architecture
> Reordered norm: Normalizing outputs of attention and feedforward layers within transformer blocks instead of inputs
> QK-norm: Normalizing key and query projections with RMSNorm before calculating attention to prevent large attention logits and training loss divergence
> Z-Loss: Adopting z-loss regularization, empirically shown to improve training stability
> RoPE: Increasing RoPE (Rotary Positional Embedding) θ to 500,000 from 10,000 to enhance positional encoding resolution

Dataset + pre-training
> Removal of GitHub repositories with fewer than 2 stars and documents with excessive repetition or binary/numerical content
> Mid-Training w/ Synthetic Data: Introduction of a second training stage using synthetic data to enhance math and STEM capabilities
> Micro-Annealing Technique: independently assess mid-training data sources, reducing experimentation costs
> Post-Training Pipeline: Expansion of the RLVR pipeline for fine-tuning base models into chat variants, focusing on permissive data



GgYin-3XIAA_qWd.jpg


2/4
@reach_vb
Paper:

Paper page - 2 OLMo 2 Furious



3/4
@reach_vb
Arch:



GgYkjHsXUAAHPNT.jpg


4/4
@winwin7264
Very impressive




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833





1/8
@reach_vb
Updated the list with more models - @NousResearch Hermes, IBM Granite, etc 🔥

[Quoted tweet]
2024 AI timeline - it has been a truly WILD year! 🔥

From Gemma to Llama 3.1 405B to Sonnet 3.5 to o3 AND MORE!

Put together a non-exhaustive list of Open and API releases from the year - looking forward to 2025 🤗


GgS7gVpXQAAh6ff.jpg


https://video.twimg.com/ext_tw_video/1874131007638585344/pu/vid/avc1/1112x720/Qiw_Tyx9plY-PZ7i.mp4

2/8
@reach_vb
find it here: 2024 AI Timeline - a Hugging Face Space by reach-vb



3/8
@reach_vb
would really appreciate PRs to update all that isn't there anymore - it's quite straightforward 🙏
GitHub - Vaibhavs10/2024-ai-timeline



4/8
@TheXeophon
Maybe it could benefit from some sort of differentiation between announcement and release? Sora was announced in Jan, but only released 11 months later



5/8
@reach_vb
good call - this is how the 2025 edition will look like: 2025 AI Timeline - a Hugging Face Space by reach-vb

does it look good to you?



6/8
@IgorCarron
and @answerdotai and @lighton



7/8
@reach_vb
yesss



8/8
@infinite_varsh
Good stuff!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833


1/18
@reach_vb
chat, is this real? smol QwQ?

PowerInfer/SmallThinker-3B-Preview · Hugging Face



2/18
@KrakowiakK
how to force any thinking model to output JSON, and include "reasoning" inside of that JSON



3/18
@reach_vb
actually no clue, you can probably post process the reasoning + output



4/18
@keiluv32
Yup



5/18
@lunarflu1
smol UwU when 🤬



6/18
@lalopenguin
was texting it yesterday! it's legit



7/18
@bennetkrause
Only SFT from QwQ, right?



8/18
@devgerred
it’s kind of funny how weird and good it is during reasoning and I totally get using it for a draft model after playing with it. I was compelled by smol reasoner.



9/18
@altryne
damn another thinker! 👏



10/18
@adi__twts
Small reasoning models are the ones I am waiting for



11/18
@Sudhir_Voleti
Reasoner at 3B Parma only?

Sign me up! Is it on ollama already or not?

Wondering about response speed though. Is it good enough for a chat kinda scenario?



12/18
@ritwik_raha
Small models ftw!



13/18
@orfeomorello
I want try it



14/18
@megaaziib
does it have gguf version? i'm gonna try this.



15/18
@WolframRvnwlf
I published my latest benchmark results yesterday () and considered adding this model - but it only achieved a score of 42.20% (173/410 correct answers) on the MMLU-Pro CS benchmark, falling below my 50% minimum threshold.

[Quoted tweet]
New year, new benchmarks! Tested some new models (DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B) that came out after my latest report, and some "older" ones (Llama 3.3 70B Instruct, Llama 3.1 Nemotron 70B Instruct) that I had not tested yet. Here's the updated graph - more details 👇


GgTxZLXX0AA5q3E.jpg


16/18
@qhy991
Nice work! you will open-source the code with training process?



17/18
@JohnJohnsonOkah
If an ant can reason with its tiny brain, an SLM definitely has to.
And just like an army of ants, SLMs will take on more complex tasks with "collective intelligence"

..will try this shortly



18/18
@gabmfrl
Soon with the paper that was recently published on how to recreate o1 we might have our own small local o1’s




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833



1/13
@reach_vb
LocalLlama has spoken on DeepSeek v3 🐳

> If you used GPT-4o, you can safely switch; it’s the same thing at a much lower cost. Sometimes even better

> v3 is the most ideal model for building AI apps. It is super cheap compared to other models, considering the performance



GgRiz3WWIAA8MOn.jpg


2/13
@reach_vb
ref:



3/13
@winwin7264
It may be smarter, but do it has vision capability?



4/13
@reach_vb
not yet, atleast for my use-case I seldom need vision, so it works perfectly



5/13
@idare
FYI I'm working hard to bring large models like Deepseek v3 685B and Llama 3 405B to be local models you can run on COTS hardware.

Yesterday I succeeded in multiple tests to load the Vocabulary, 61 layers and access the 256 MoE of Deepseek v3 with only 4GB RAM, actually 228MB of RAM used of the 4GB cap I set.

Transformer tests were successful across thousands of random operations.

This is on my Arm based Orange Pi 5 Plus with 32GB RAM, not yet utilizing the RK3588 NPU.

I have additional tests prepared to run on the Ryzen 9 3900 with 128GB RAM (96GB allocated under Proxmox as a VM).

Then additional tests on Nvidia 3060 with 12GB VRAM.

Then additional tests to run under a constrained Docker container.

Then I'll run it under @ollama in various configurations.

I'll need an Apple Mac Mini M3 and M4 to test on, preferably with 16-128GB of RAM. With that I'll do additional Apple Silicone MLX optimizations.

I've been working on this for the last two years and finally seeing real-world success.

Further optimizations are coming!



6/13
@LottoLabs
After brief investigations Sonnet is still king



7/13
@adugbovictory
If DeepSeek v3 mirrors GPT-4o’s capabilities at a fraction of the cost, does this signal a major shift towards open models dominating the AI landscape? What do you think?



8/13
@yangyc666
DeepSeek v3? Sounds like a llama in a lab coat.



9/13
@CohorteAI
Check out how models like DeepSeek are reshaping AI affordability: What Can Large Language Models Achieve? - Cohorte Projects.



10/13
@z_almahdi_tech
Great analysis! DeepSeek v3's affordability is indeed appealing. How does it handle sophisticated AI tasks? Insight on its real-world performance would be valuable for those developing complex applications.



11/13
@gruuummm
Claude is still the best llm for coding.



12/13
@xingyu_liao
how about gemini flash2



13/13
@AyarzaJack17260
Randell Burhanuddin Your content exchange is a true gift for avid readers. Your guidance means a lot to me. Thanks, bro.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833

1/2
@linoy_tsaban
Amazing that flux is #1 🤯

[Quoted tweet]
Top 25 AI models in 2024 on Hugging Face 🔥

@bfl_ml w/ Flux.1-dev & Flux.1-schnell - current SoTA open Text to Image models

@AIatMeta w/ Llama 3.X series (1B to 70B) 🦙- competitive LLMs across sizes

@StabilityAI w/ SD 3.5 Medium & Large

@GoogleAI Gemma 7B 💎

@xai w/ grok-1 (hoping grok-2 gets open-sourced soon too)

@ByteDanceOSS SDXL Lightning

@NVIDIAAI Nemotron 70B

@mattshumer_ Reflection Llama 3.1 70B 🙃

@CohereForAI Command R Plus

@OpenAI Whisper Large v3 Turbo 👑 - Whisper v4 wen?

@Microsoft Phi 3 & OmniParser - LLM trained on synthetic data & vision based GUI agent

@BAAI BGE M3 - competitive multilingual text embedding model

@Alibaba_Qwen QwQ 32B and 2.5 Coder 32B

@Apple OpenELM

@2_Noise_ ChatTTS - competitive Text to Speech model

What a brilliant year - dominated by LLMs but I'm quite positive other modalities would accelerate and shine even stronger in 2025! 🤗


GgCqwinXgAA6o1V.jpg


2/2
@__sYmbio__
I'm not that surprised, generating AI images / videos is what got me into local AI tools in the first place.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/17
@reach_vb
Top 25 AI models in 2024 on Hugging Face 🔥

@bfl_ml w/ Flux.1-dev & Flux.1-schnell - current SoTA open Text to Image models

@AIatMeta w/ Llama 3.X series (1B to 70B) 🦙- competitive LLMs across sizes

@StabilityAI w/ SD 3.5 Medium & Large

@GoogleAI Gemma 7B 💎

@xai w/ grok-1 (hoping grok-2 gets open-sourced soon too)

@ByteDanceOSS SDXL Lightning

@NVIDIAAI Nemotron 70B

@mattshumer_ Reflection Llama 3.1 70B 🙃

@CohereForAI Command R Plus

@OpenAI Whisper Large v3 Turbo 👑 - Whisper v4 wen?

@Microsoft Phi 3 & OmniParser - LLM trained on synthetic data & vision based GUI agent

@BAAI BGE M3 - competitive multilingual text embedding model

@Alibaba_Qwen QwQ 32B and 2.5 Coder 32B

@Apple OpenELM

@2_Noise_ ChatTTS - competitive Text to Speech model

What a brilliant year - dominated by LLMs but I'm quite positive other modalities would accelerate and shine even stronger in 2025! 🤗



GgCqwinXgAA6o1V.jpg


2/17
@reach_vb
results if you want to play with it:



GgCxifnXYAAU_ZY.jpg


3/17
@reach_vb
Extracted via SQL from:

cfahlgren1/hub-stats · Datasets at Hugging Face



4/17
@diegoguerror
2025 is going to be fun! 🦙🔥



5/17
@reach_vb
100% yes!



6/17
@JustinLin610
Surprised that Qwen2.5 is not on the list😂



7/17
@reach_vb
QwQ + 2.5 Coder is there - The rest didn't make the hype list haha! - Lots of work to do in 2025 to get all Qwen models in top 25.



8/17
@AI_Fun_times
Exciting to see the advancements in AI models for Text to Image and Large Language Models!



9/17
@amazingyearT1
I loved the previous grok. What am I supposed to be doing with the new grok to love it as much as I loved the old grok. I’m kind of lost…



10/17
@matthaeus_win
Seriously, nobody saw BFL dropping such an amazing release out of nowhere! Flux1 was already a game-changer, so I'm hyped for Flux2.



11/17
@BramVanroy
Surprised with these results. Seeing groq on it and not Qwen (non coder) is crazy. Pretty confident that won't bet reflected in downloads!



12/17
@thotsonrecord
I *LOVE* categorized leaderboards 💥👀💎 I'd read daily



13/17
@CoraLee404
Where’s ChatGPT?



14/17
@AbdullahAdeebi
Top AI model of 2025: o3 level’s open source ai.



15/17
@EngrSARFRAZawan
Llama 3 70B is the 🏆



16/17
@unclecode
Look from top 7, all LLMs are small size. Message is clear



17/17
@EthanSynthMind
More models, more noise. Quality over quantity.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833






1/11
@Yuchenj_UW
I love Nvidia and Jensen, but their presentation of numbers bothers me:

- vague terms like "AI TOPS"
- compare FP4 on 5090 with FP8 on 4090
- show FP4 FLOPS and claim a $3,000 box runs a 200B model
- plot graph mixing FP16, BF16, FP8, FP4, as if FP1 is usable in 2 years

Why can't we just get an apples-to-apples comparison?

Just show me the BF16 numbers! All I care about is how much faster I can train my GPT-2/Llama-3 and run these models at BF16/FP8 with your new chips.



GgtPSpsawAAapUg.jpg

GgtPSpua0AAXE1q.jpg

GgtPSptagAAiqL3.jpg

GgtPSpuakAE--6c.jpg


2/11
@DataDeLaurier
its the apples-to-nvidia comparisons that i want



3/11
@Yuchenj_UW
😂



4/11
@DFinsterwalder
+ No memory bandwidth for Project Digits



Ggwpv8OWQAA0527.jpg


5/11
@Yuchenj_UW
Haha yeah



6/11
@RajaXg
Wonder what will get the "community notes" to trigger on fake floating point spec comparisons..



7/11
@Yuchenj_UW




GgtwlqOb0AA-97S.jpg


8/11
@sabareeshkkanan
I could not find any information about the memory bandwidth . Did you find any Digits



9/11
@Yuchenj_UW
in their specs page: NVIDIA GeForce RTX 5090 Graphics Cards



GgtaumwbgAAjc96.jpg


10/11
@Laz4rz
the same level of shady tactics openai does more and more often, and that hardware world seen for quite some time now



11/11
@Yuchenj_UW
I get that they want to do marketing, but please make developers' lives easier. I had to be super careful about their reported numbers and pay extra attention to the footnotes.

There is always a footnote. 💀




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/11
@scaling01
Nvidia marketing: "5070 has 4090 performance"

vs reality:



Ggtkz_RXsAEHC6L.png


2/11
@humancompressed
Yeah, it only holds up for gaming with the new DLSS



3/11
@scaling01
Idk I don't see a reason why 4090 couldn't run DLSS 4. Is there like special hardware or are they just soft locking you out?



4/11
@MitjaMartini
The RTX 50xx line is with the notable exception of the RTX 5090 a disappointment when it comes to local AI.



5/11
@scaling01
Nowadays it's only the top die that is really good. They want to upsell you to higher margin products.

I mean the 5080 should really be a 5070 with how cut down the die is 💀



6/11
@hacnslash1337
I mean it’s like 4090-ISH ok? Close enough what’s a few TOPS between friends.



7/11
@scaling01
4090 is almost 3 times faster. How is that close enough? 💀



8/11
@Johmbart
Surely this is just comparing specifications, not performance.



9/11
@scaling01
Performance is based on specifications. The 4090 literally Had almost 3 times the cores and FLOPS.



10/11
@dravidan




11/11
@scaling01
Nooo




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833


1/5
@_philschmid
REINFORCE++ an improvement to the classical REINFORCE that integrates PPO-inspired techniques to achieve simpler, more stable, and efficient RLHF, leading to 30% faster training with comparable performance.

What's new with REINFORCE++
1️⃣ No Critic Network: Unlike PPO, REINFORCE++ removes the need for a separate value function, reducing compute and memory usage.
2️⃣ Token-Level KL Penalty: Compared to standard REINFORCE or RLOO, REINFORCE++ applies a penalty at each token step, curbing undesired divergence more granularly.
3️⃣ PPO-Style Clipping Minus Complexity: REINFORCE++ keeps the ratio clipping from PPO for stable updates, but avoids the added overhead of maintaining a critic.
4️⃣ Smoother Training: Using mini-batch advantage normalization and reward clipping stabilizes gradient updates more effectively than traditional REINFORCE.

Insights
💡 Eliminates critic network (value function) while maintaining PPO's stability benefits
🔒 Uses token-level KL penalties to prevent reward/length hacking
🛠️ PPO-Style Clipping for stable updates without large parameter shifts.
⚡ Reduces training time by 30% compared to PPO (42 vs 60 hours on H100)
🎯 Achieves comparable or better performance than GRPO (Qwen, Deepseek) in math reasoning
📈 Better reward increase per unit KL divergence in math scenarios.
🔍 Tested on both general domain (OpenRLHF/prompt-collection-v0.1, OpenRLHF/preference_700K) and specialized mathematics datasets (meta-math/MetaMathQA).



Ggx7UcTWYAAxkU0.jpg


2/5
@_philschmid
Paper: Paper page - REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Github: OpenRLHF/examples/scripts/train_reinforce_llama_ray.sh at main · OpenRLHF/OpenRLHF

Paper page - REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models



3/5
@mysophobi
20K run times will slow down developments for competitors as well.



4/5
@HiSteveKaplan
no critic network? nice. could really help with faster iterations in real-world apps. less complexity means more focus on results. love it! 30% faster is huge.



5/5
@AI_Fun_times
Exciting to see REINFORCE++ innovate with a unique twist! Skipping the Critic Network is a bold move. ✨




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,197
Reputation
8,613
Daps
161,833


1/2
@_philschmid
New open multilingual embedding models released! KaLM-Embedding is a series of embedding models built on @Alibaba_Qwen 2 0.5B and released under MIT. 👀

TL;DR:
🚀 Built on Qwen2-0.5B trained on 550k synthetic data released under MIT
🧹 Implements ranking consistency filtering to remove noisy and false negative samples
📊 Achieves 64.53 average score on MTEB benchmark (64.13 C-MTEB, 64.94 MTEB)
🎯 Supports flexible dimension embedding through Matryoshka Representation Learning
🌍 Strong multilingual performance outperforms other open models
🤗 Integrated into sentence-transformers available on Hugging Face



Gg3VjShWkAAuTV4.jpg


2/2
@_philschmid
Models: KaLM-embedding - a HIT-TMG Collection
Paper: Paper page - KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model
Code: GitHub - HITsz-TMG/KaLM-Embedding

KaLM-embedding - a HIT-TMG Collection




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top