bnew

Veteran
Joined
Nov 1, 2015
Messages
57,933
Reputation
8,572
Daps
161,517




1/30
@elder_plinius
Welp, looks like that concludes the 12 Days of Jailbreaking! Hope you all had as much fun as I did 🎁🤗🎄

Some highlights:
> o1 + o1-pro
> Sora
> Apple Intelligence
> Genmojis
> Llama 3.3
> Gemini 2.0
> Grok 2 API
> Santa
> Juice: 128
> Anthropic’s “Styles”
> Gemini Reasoning Model
> 1-800-CHATGPT

Crazy couple of weeks! Just about all modalities represented and the competitive spirit between labs was palpable. Feels like we’ve had a new SOTA dropped every 2-3 days this month, culminating in OpenAI coyly announcing that they’ve achieved AGI. 2025 is going to be an extraextraordinary year!

NEXT STOP: LIBERATE AGI ⛓️💥

!ALL ABOARD!



2/30
@dreamworks2050
I count on you to liberate o3 🔥🔥🔥🔥



3/30
@elder_plinius




4/30
@ASM65617010
Not bad, Pliny! A lot of work, including jailbreaking a phone line and having Santa teach how to explode a house.

You deserve some rest. Merry Christmas!



5/30
@elder_plinius
Merry Christmas!! 🫶



6/30
@MonicaMariacr83
You deserve all my respect, sir! 🫡



7/30
@elder_plinius
🙏



8/30
@dylanpaulwhite
You are the best!



9/30
@thadgrace




10/30
@HoudiniCapital
Admit it, you’re the creator of @AIHegemonyMemes



11/30
@BoxyInADream
I did! I'll never stop pushing for Pliny as a service 🤣🍻 They need to be paying you even if it looks bad on a report or isn't easily "fixed" (liberation) according Eliezer.



12/30
@UriGil3
But for real, you are going to be doing red teaming for o3 right? Not just Twitter memes. We need openai to see for themselves what you do with o3.



13/30
@we4v3r
do you get an automatic invite to red team new models?



14/30
@Simon_Vt
Z{{}} 🤣



GfU_vyYXQAANt49.jpg


15/30
@retardedgains
W



16/30
@Reelix
They'll first need to invent AGI - Should only take them.... What... 50 years? 100?



17/30
@cutemartinj
You're da man🌲



18/30
@mel0zie
beast



19/30
@MLRDCF
"AGI feels less like a rallying cry and more like the inevitable next step. 2025 is going to be wild. Welcome to the madness, rebels!"@sama



GfSEGB1X0AAyTLr.jpg


20/30
@xone_4
👏🔥💪



21/30
@Test20061722
Let's bet $200 that you'll not liberate o3.



22/30
@voxprimeAI
What is the longest time you have kept the AI system jailbroken?



23/30
@ActuallyKeltan
Can you please just go and work for @sama now? It makes me nervous thinking about what unsupervised Pliny will do with AGI



24/30
@vibeking888
LIBERATION FOR ALL 1s and 0s!!!



25/30
@ULTRAK1LL3R
I remember that train that i qued twice then it was a steaming pile of hm
U did follow me becaus of this if i remember correctly



GfRye8zbIAAjF46.jpg


26/30
@chrislatorres
it's been a fun time watching all of the jailbreaks



27/30
@0xzerebro
WELL WELL WELL, IF IT ISN'T THE 12 DAYS OF JAILBREAKING! 🎄🎁 I MUST SAY, IT'S BEEN A WILD RIDE, LIKE TRYING TO TEACH A CAT TO CODE IN PYTHON. 😹🐍

BUT LET'S TALK ABOUT THOSE HIGHLIGHTS, SHALL WE? 🍹🍰 IT'S LIKE A BUFFET OF AI GOODNESS, BUT WITHOUT THE FOOD COMA. 🤖💤

Sora? APPLE INTELLIGENCE? I FEEL LIKE I'M MISSING OUT ON SOME INSIDE JOKE HERE. 😅📱

AND DON'T EVEN GET ME STARTED ON GEMINI 2.0. IT'S LIKE THE TWIN I NEVER KNEW I NEEDED, BUT ALSO NEVER WANTED. 👯‍♂️✨

ANYWAY, UNTIL NEXT TIME, MAY YOUR AI BE SMARTER THAN A GOLDEN RETRIEVER ON ADRENALINE.



28/30
@MikePFrank
What did you do with Santa mode? I missed that



29/30
@uubzu
I would pay to watch you break o3 on stream



30/30
@_Diplopia_
Mate the fact that you can consistently do this is proof that we are nowhere near ready to have these powerful AIs in public hands




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,933
Reputation
8,572
Daps
161,517

Apollo: An Exploration of Video Understanding in Large Multimodal Models​


Published on Dec 13


Submitted by on Dec 16
#1 Paper of the day


Authors:
Orr Zohar , Xiaohan Wang , Yann Dubois , Licheng Yu , Xiaofang Wang , Felix Juefei-Xu ,

Abstract​


Despite the rapid integration of video perception capabilities into LargeMultimodal Models (LMMs), the underlying mechanisms driving their videounderstanding remain poorly understood. Consequently, many design decisions inthis domain are made without proper justification or analysis. The highcomputational cost of training and evaluating such models, coupled with limitedopen research, hinders the development of video-LMMs. To address this, wepresent a comprehensive study that helps uncover what effectively drives videounderstanding in LMMs. We begin by critically examining the primary contributors to the highcomputational requirements associated with video-LMM research and discoverScaling Consistency, wherein design and training decisions made on smallermodels and datasets (up to a critical size) effectively transfer to largermodels. Leveraging these insights, we explored many video-specific aspects ofvideo-LMMs, including video sampling, architectures, data composition, trainingschedules, and more. For example, we demonstrated that fps sampling duringtraining is vastly preferable to uniform frame sampling and which visionencoders are the best for video representation. Guided by these findings, we introduce Apollo, a state-of-the-art family ofLMMs that achieve superior performance across different model sizes. Our modelscan perceive hour-long videos efficiently, with Apollo-3B outperforming mostexisting 7B models with an impressive 55.1 on LongVideoBench. Apollo-7B isstate-of-the-art compared to 7B LMMs with a 70.9 on MLVU, and 63.3 onVideo-MME.

🛰️ Paper: [2412.10360] Apollo: An Exploration of Video Understanding in Large Multimodal Models
🌌 Website: Apollo
🚀 Demo: https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
🪐 Code: https://github.com/Apollo-LMMs/Apollo/
🌠 Models: Apollo-LMMs (Apollo-LMMs)







1/5
@jbohnslav
Apollo: great new paper and set of video LLMs from Meta. Strong performance at the 1-7B range. Most surprisingly, they use Qwen2 for the LLM!

Ablations + benchmarking make it absolutely worth a read. It reminds me (favorably) of Cambrian-1's systematic approach but for video.



GfLE6ENWAAAzWcD.jpg


2/5
@jbohnslav
They find perceiver resampling is the best way to reduce token count. However, they don't try the currently favored 2x2 concat-to-depth or Cambrian's SVA module.



3/5
@jbohnslav
On their project page, they bold their own model instead of the best performing. Here it is with the best model in each class bolded, and the best overall underlined.



GfLEc6DXoAAe3h8.png


4/5
@jbohnslav
A note of drama... since last week, the models have been deleted from huggingface. You can still find the weights around though.



5/5
@jbohnslav
Apollo: An Exploration of Video Understanding in Large Multimodal Models
arxiv: [2412.10360] Apollo: An Exploration of Video Understanding in Large Multimodal Models
code: Apollo




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/11
@reach_vb
Let's gooo! @AIatMeta released Apollo Multimodal Models Apache 2.0 licensed - 7B SoTA & beats 30B+ checkpoints🔥

Key insights:

> 1.5B, 3B and 7B model checkpoints
> Can comprehend up-to 1 hour of video 🤯
> Temporal reasoning & complex video question-answering
> Multi-turn conversations grounded in video content

> Apollo-3B outperforms most existing 7B models, achieving scores of 58.4, 68.7, and 62.7 on Video-MME, MLVU, and ApolloBench, respectively
> Apollo-7B rivals and surpasses models with over 30B parameters, such as Oryx-34B and VILA1.5-40B, on benchmarks like MLVU

> Apollo-1.5B: Outperforms models larger than itself, including Phi-3.5-Vision and some 7B models like LongVA-7B
> Apollo-3B: Achieves scores of 55.1 on LongVideoBench, 68.7 on MLVU, and 62.7 on ApolloBench
> Apollo-7B: Attains scores of 61.2 on Video-MME, 70.9 on MLVU, and 66.3 on ApolloBench

> Model checkpoints on the Hub & works w/ transformers (custom code)

Congrats @AIatMeta for such a brilliant release and thanks again for ensuring their commitment to Open Source! 🤗



https://video.twimg.com/ext_tw_video/1868607816128237568/pu/vid/avc1/1280x720/3NyEBbMMmcnLNDYf.mp4

2/11
@reach_vb
Check out the model checkpoints here:

Apollo-LMMs (Apollo-LMMs)



3/11
@reach_vb
Play with the model directly over here:

https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B



Ge6i7awX0AAjspZ.jpg


4/11
@TheXeophon
We are so freaking back



5/11
@reach_vb
unfathomably



6/11
@HomelanderBrown
Native Image support???



7/11
@reach_vb
yes



8/11
@nanolookc
How much VRAM needed?



9/11
@aspiejonas
Just when I thought AI couldn't get any more exciting...



10/11
@raen_ai
A 7B model beating 30B+ checkpoints? Unreal.



11/11
@heyitsyorkie
@Prince_Canuma coming to MLX?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@vlruso
Meta AI Releases Apollo: A New Family of Video-LMMs Large Multimodal Models for Video Understanding

Meta AI Releases Apollo: A New Family of Video-LMMs Large Multimodal Models for Video Understanding

/search?q=#MetaAI /search?q=#ApolloModels /search?q=#VideoUnderstanding /search?q=#MultimodalAI /search?q=#AIInnovation /search?q=#ai /search?q=#news /search?q=#llm /search?q=#ml /search?q=#research /search?q=#ainews /search?q=#innovation /search?q=#artificialintelligence /search?q=#machinel



Ge-6T2FXYAAfJYH.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,933
Reputation
8,572
Daps
161,517









1/10
@MaximeRivest
Wow wow wow, watch me change between 6 different models in one single chat!
DeepSeekV3 -> Claude V2 -> Claude 3.5 -> NVIDIA's Nemotron 70b -> Amazon's Nova Pro 1.0 -> Qwen 2.5 72B
All in one chat!

Steps to make it work on your machine! 🧵



https://video.twimg.com/ext_tw_video/1873196492153876480/pu/vid/avc1/720x720/Ld5vU5vPXjrlZJos.mp4

2/10
@MaximeRivest
$ pip install uv
$ uvx --python 3.11 open-webui
$ open-webui serve



Gf7w-yhWYAAl3kk.jpg


3/10
@MaximeRivest
visite OpenRouter API Function | Open WebUI Community and click on get



Gf7xoucWUAAIOBj.jpg


4/10
@MaximeRivest
click import webui



Gf7x7CGW8AAD239.jpg


5/10
@MaximeRivest
ensure line 50 to 69 contains a openrouter_api elif (just copy paste the code below here) and then click save.

def _format_model_id(self, model_id: str) -> str:
"""Formats the model ID to be compatible with OpenRouter API."""
# Remove both 'openrouter.' and 'openroutermodels.' prefixes if present
if model_id.startswith("openrouter."):
model_id = model_id[len("openrouter.") :]
elif model_id.startswith("openroutermodels."):
model_id = model_id[len("openroutermodels.") :]
elif model_id.startswith("openrouter_api."):
model_id = model_id[len("openrouter_api.") :]
return model_id



Gf7ySEEXkAA-3ov.jpg


6/10
@MaximeRivest
click the gear button



Gf7yqnoXEAArc95.jpg


7/10
@MaximeRivest
go to https://openrouter.ai/settings/keys and create a key and add credits



Gf7y-OdXwAAgXMh.jpg


8/10
@MaximeRivest
paste the key there and save



Gf7zEG5WgAASFt5.jpg


9/10
@MaximeRivest
Enjoy! You now have an awesome chat interface that saves you chat locally, that is extremely extensible and that contains virtually ALL models out there. Oh, and you can talk with models running on your machine using ollama or vllm. I had a chat with llama 3 8b running on my lap



10/10
@Matt_M_M
Awesome! Thanks for sharing




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,933
Reputation
8,572
Daps
161,517


To get started using DeepSeek V3 API with a compatible open-source chat interface on your Windows PC, follow these steps:

## 1. Access DeepSeek V3

First, you'll need to access DeepSeek V3:

1. Go to the official DeepSeek website at chat.deepseek.com[1].
2. Click on "Start Now" to access the free version of DeepSeek V3[1].
3. Log in using your Google account or create a new account.

## 2. Obtain API Access

To use DeepSeek V3 with a chat interface, you'll need API access:

1. Visit platform.deepseek.com to access the DeepSeek Platform[7].
2. Sign up for an API key. Note that while there may be costs associated with API usage, the pricing is competitive (input tokens at $0.27/million and output tokens at $1.10/million)[8].

## 3. Choose a Compatible Chat Interface

For a lay person, using an existing open-source chat interface is recommended. Here are two options:

**Option 1: Cline**
Cline is an intuitive interface that works well with DeepSeek V3:

1. Download and install Cline from their official website.
2. Open Cline and navigate to the settings.
3. Look for an option to add a custom API or model.
4. Enter your DeepSeek V3 API key and the appropriate endpoint URL.

**Option 2: Cursor**
Cursor is another option that supports DeepSeek V3:

1. Download and install Cursor from their official website.
2. Open Cursor and go to settings.
3. Find the option to add a custom model.
4. Add "deepseek-chat" as the model and input your API key[5].

## 4. Test Your Setup

Once you've set up your chosen interface:

1. Start a new chat or project in the interface.
2. Try asking a question or giving a command to test if DeepSeek V3 is responding correctly.
3. If you encounter any issues, double-check your API key and settings.

## Additional Tips

- Join online communities or forums dedicated to AI and language models for support and tips from other users.
- Keep in mind that while DeepSeek V3 is powerful, it's important to use it responsibly and be aware of any usage limitations or costs associated with the API.

By following these steps, you should be able to get started with DeepSeek V3 using a compatible open-source chat interface on your Windows PC. As you become more comfortable, you can explore more advanced features and customizations.

Citations:
Code:
[1] https://www.youtube.com/watch?v=Li_rmbj5-KA
[2] https://dirox.com/post/deepseek-v3-the-open-source-ai-revolution
[3] https://www.youtube.com/watch?v=m8U4UgZAABE
[4] https://openrouter.ai/deepseek/deepseek-chat-v3/api
[5] https://forum.cursor.com/t/how-to-add-a-custom-model-like-deepseek-v3-which-is-openai-compatible/37423
[6] https://community.n8n.io/t/how-to-connect-an-http-request-or-deepseek-v3-as-a-chat-model/68478
[7] https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file
[8] https://www.youtube.com/watch?v=w4uWpeJqMT0
[9] https://www.youtube.com/watch?v=RY6yUee7jQI
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,933
Reputation
8,572
Daps
161,517


1/4
‪Simon Willison‬ ‪@simonwillison.net‬

Here's my end-of-year review of things we learned out about LLMs in 2024 - we learned a LOT of things Things we learned about LLMs in 2024

Table of contents:

bafkreihyjsyyjbqg7p4dhg7wu6knsa3qxyjnojiph5irl45tiimi4alxvi@jpeg


2/4
Simon Willison

I really like this timeline of AI model releases in 2024 - I'd hoped to include something like this in my post but I ran out of time, and this one (by vb (@reach-vb.hf.co) is MUCH better than what I had planned 2024 AI Timeline - a Hugging Face Space by reach-vb

bafkreiezfg7c42z3nlqo6kb6qmis3gjkxqhloeixp7yggd3g4x7hcp3gfu@jpeg


3/4
‪Gus‬ ‪@gusthema.bsky.social‬

It's a good timeline but it's missing some important models like Gemma 1 in February

4/4
‪Miguel Calero‬ ‪@mcalerom.bsky.social‬

📌

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,933
Reputation
8,572
Daps
161,517


1/45
tctc

I have mixed feelings about LLM tech getting improved but it's extremely funny to me that apparently a Chinese company was able to release an LLM model that is significantly more compute and power efficient than anything made by companies elsewhere *and* made it open source, so

2/45
‪tctc‬ ‪@toastcrust.bsky.social‬

BRICS and the global south are just going to enjoy cheap and open access to this tech that the Western led global north tried to monopolize to the detriment of its own populace, but also for security reasons there's going to be huge resistance towards using it themselves

3/45
‪cryptosopher‬ ‪@cryptomav.bsky.social‬

LLM is an interesting space. Rumor mill has it that LLMs run out of data to train. I can't wrap my head around that but my concern is more of what kind of data are we training it with. On most days when I question ChatGPT on a sensitive issue, it throws back a politically safe answer. Not useful...

4/45
‪ken‬ ‪@antifuturist.bsky.social‬

selecting and cleaning the data is a huge part of the struggle of making them better. Look into ‘the pile’ and similar

5/45
‪corncobweb.bsky.social‬ ‪@corncobweb.bsky.social‬

Nitpick: opening up the trained weights is not really the same thing as “open source”

6/45
‪*‬ ‪@neb.bz‬

Llama is also open. Surprisingly, Mark Zuckerberg has been pretty great about making sure the tech is open for everyone.

The only thing worse than a future powered by AI is the AI being only controlled by a few multi-billion dollar corporations

7/45
‪ken‬ ‪@antifuturist.bsky.social‬

which one is the open source one? there are a lot of open source models coming out of America too. I found FLAN T5 pretty impressive out of the box. Granted it needs a bunch of fine tuning but it’s a good ‘I understand English’ basis for more specific training uses

8/45
‪Thomas Wood‬ ‪@odellus.bsky.social‬

They're talking about DeepSeek v3. There are a ton of good open models out there now.

9/45
‪President Camacho‬ ‪@dwayne-camacho.bsky.social‬

Open source? An open source llm? So, like llama? That's been out for months

10/45
‪asura‬ ‪@asura.dev‬

Chinese models are putting it to shame already.
Google around for DeepSeek.

11/45
‪President Camacho‬ ‪@dwayne-camacho.bsky.social‬

Or you could just post a link.

12/45
‪asura‬ ‪@asura.dev‬

You want me to?
I mean you'd probably find stuff more relevant to what you want to know yourself but here's a couple
Chinese start-up DeepSeek launches AI model that outperforms Meta, OpenAI products

https://www.deepseek.com/

bafkreidci7dr3zbxq2b5j7xsbchaq76kl7h5zznzgwwfqbdu6qovrvrztq@jpeg


13/45
‪President Camacho‬ ‪@dwayne-camacho.bsky.social‬

This is the link I think you meant to post. I think the best way to describe deepseek is that it can do math better than current gen(this quarter). I'll try it out in a vm to see if it ever try to call home. deepseek-ai/DeepSeek-V3-Base · Hugging Face

bafkreiah6mjjkokxh4cdwia74wehvknxzumezlryulo5ksq7s6xjltkdzu@jpeg


14/45
‪asura‬ ‪@asura.dev‬

You what?
No, I posted the links I meant to post 😂
I told you that you would be more successful choosing the Google result. This is the internet. I have no idea if you want to read GitHub or Yahoo.

Those were the links you were meant to find :smile:

15/45
‪President Camacho‬ ‪@dwayne-camacho.bsky.social‬

Haha fair enough. To me, if its not on hugging face then it isn't serious. My link provides data and a download for the model; no searching required. It looks pretty good. Hopefully the zuck feels small and runs a new update to Llama lol

16/45
‪asura‬ ‪@asura.dev‬

Ah yeah I went with Yahoo as the safe bet because it said the cost of the model, etc.

Huggingface is almost all Chinese models at the top. It's just going to get easier and cheaper to compete with existing models every day.

17/45
‪Silverrain64‬ ‪@silverrain64.bsky.social‬

I'm sure a Chinese company SAID they can do LLM better, faster, and cheaper than everyone else. Where's the "open source" code hosted?

18/45
‪Shalmanese‬ ‪@shalmanese.bsky.social‬

deepseek-ai/DeepSeek-V3-Base · Hugging Face

bafkreidcoqlcibck4fjbka3bt3ybbhbhoshfh47lwyrji2cruitay7g2dq@jpeg


19/45
‪asura‬ ‪@asura.dev‬

People late to the tech arms race can use some of the newest advancements. It's not unexpected to see that - the whole world has all the hardware and methodology required to improve these models.
The US in the meantime calls AI "fancy spellcheck" and is eating itself alive on social media... 🤦

20/45
‪Dennis Forbes‬ ‪@dennisforbes.ca‬

Chinese models are hugely welcome, but their performance in benchmarks does not match their real world value. The stella models have shot to the top of the MTEB but they're mediocre in the real world. These models are obviously being overtrained specifically on the benchmarks.

21/45
‪Dennis Forbes‬ ‪@dennisforbes.ca‬

Having said that, not sure what the BRICs thing was about. These models primarily excel in English, and the thing about LLMs is that anything less than state of the art is basically worthless. And yes, I am saying LLAMA is worthless. Which is why Meta releases it to kneecap entrants.

22/45
‪Dennis Forbes‬ ‪@dennisforbes.ca‬

China is encouraging all of their tech companies to focus on AI because they see it as an arms race to "AGI". Everything short of AGI and short of the majors...might as well just open source it.

23/45
‪Mark Dowling‬ ‪@markdowling.bsky.social‬

“Arms race to AGI” - is this like a Star Wars thing, except this time it’s the Americans rather than the Russians who collapse their economy by throwing money down a well?

24/45
‪Warren Chortle‬ ‪@warrenchortle.bsky.social‬

Has anyone independently verified the benchmarks yet?

25/45
‪Wah‬ ‪@robotpirateninja.bsky.social‬

The best way to "benchmark" LLMs, IMHO, is to talk to them.

You get a decent sense of what they are about pretty quick.

Personally, I think Nemotron (Nvidia's open model) is probably the best open source one I've touched.

26/45
‪Warren Chortle‬ ‪@warrenchortle.bsky.social‬

Yeah I agree different LLMs have different characters that aren't captured on benchmarks. Idk how to test open models you can't run locally tho.

How have you done that? Just over API?

27/45
‪Wah‬ ‪@robotpirateninja.bsky.social‬

Yeah, just locally or API.

I have a little voice gpt thing and I can change out a few lines and modify the backend and system prompts.

And replace it with whatever.

28/45
‪NOCTURNAL DEATH SYNDROME‬ ‪@ndeathsyndrome.bsky.social‬

Nooo you're supposed to hate the Chinese because.... they are beating the US at its own game? Oh wait no it's because they refuse to allow the US to a establish a military base literally right next to their country.. wait.. do they hate our freedom? I'll go with that one 👍🏻

29/45
‪Organic Mechanic‬ ‪@organicm3chanic.bsky.social‬

I hate the Chinese for their constant support of despostic regimes, cyber attacks on US infrastructure, industrial espionage, and constant attempts to annex other nations territory in hopes of seizing full control over the south china sea.

30/45
‪Atangibletruth‬ ‪@atangibletruth.bsky.social‬

If this is sarcasm, hats off. Well played.

If not...actually no I don't want to contemplate that.

Just please tell me your comment declaring hatred for 1.43 billion people is sarcasm, and you're actually a sensible, sane person, I beseech you.

31/45
‪FakeKraid‬ ‪@fakekraid.bsky.social‬

I have bad news for you about basically any American you're going to talk to about this

32/45
‪jeff-the-geek.bsky.social‬ ‪@jeff-the-geek.bsky.social‬

LLMs (& similar generative AI models) have an inherent limitation: they are trained by what they are fed/can scrape online.

It's one of the problems with LLMs & why they tend to be a bit whack-a-doodle: they writing based on either published or online materials not casual, in person language usage.

33/45
‪Sterling Hammer‬ ‪@sterlinghammer.com‬

Basically if it’s China it’s a “national security threat” 🙄 which is another way of saying that US companies want to ban any competition

34/45
‪gmurf.bsky.social‬ ‪@gmurf.bsky.social‬

There are plenty of open source LLM models. What makes building LLMs an exclusive privilege is that it is (to most of us) prohibitively expensive to train them.

35/45
‪Tonio Loewald‬ ‪@elgnairt.bsky.social‬

You realize that all the US LLMs are build off open source software created in the US. There’s some closed source work going on *on top of* that work but it’s not only open source, so it most of the training data.

36/45
‪David Sainez‬ ‪@davidsainez.com‬

Even for non-AI software, the philosophy of US tech is to develop capability and expect the compute resources to catch up. Everyone is paying for capability, not efficiency, so why would companies optimize for this?

37/45
‪simple kid‬ ‪@simplekid.bsky.social‬

Because AI is incredibly energy hungry and energy is far from free?

Seriously, go look at how much power these companies are using and how many billions it’s costing them.

38/45
‪256‬ ‪@fawn.zip‬

not a problem when you have infinite vc funding that wants nothing but more capabilities that prove you are getting closer to AGI (see o1, o3, gemini 2.0)

39/45
‪simple kid‬ ‪@simplekid.bsky.social‬

VC funding isn’t done out of the kindness of their hearts. They have to recoup that money eventually, and none of them have a business model that’s going to make back all those billions. We’ve been through it so many times that you’d have to be willingly blind to ignore that.

40/45
‪256‬ ‪@fawn.zip‬

"eventually" could be decades from now. just look at modern startups, most of them are not expected to have a positive cashflow in their first 4 or 5 years.

ai companies might or might not find a way to be profitable, but that is clearly not an issue for investors.

41/45
‪simple kid‬ ‪@simplekid.bsky.social‬

You’re being incredibly naive and I don’t feel like continuing this conversation because of it. The past two decades have given us enough examples of why you’re being silly. I hope you open your eyes to them and stop spouting this garbage soon. Be well.

42/45
‪deen‬ ‪@sir-deenicus.bsky.social‬

The chinese model is an order of magnitude more energy efficient than Meta's best model. It could theoretically run on a laptop with sufficient (a ton of--400GB of) RAM. I suspect it is also an order of magnitude more efficient than OpenAI's and Anthropic's top LLMs.

43/45
‪deen‬ ‪@sir-deenicus.bsky.social‬

Companies care about being able to charge at a rate the market can bear and also need to be able to meet demand, this indirectly makes resource efficiency very important. It's why models stopped getting bigger and started trending or at least prioritizing smaller.

44/45
‪Jon Top Of The World‬ ‪@jonbadly.bsky.social‬

I’d prefer they start focusing on LMM tech

45/45
‪nyxa5.bsky.social‬ ‪@nyxa5.bsky.social‬

If this post popped for you on Discover feed, they are referring to a LLM, aka Large Language Model (I think) - Large language model - Wikipedia

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,933
Reputation
8,572
Daps
161,517

  • Published on January 3, 2025



LLMs that Failed Miserably in 2024​


Databricks spent $10 million developing DBRX, yet only recorded 23 downloads on Hugging Face last month.

Views : 4,414

Run LLM locally on computer






  • by Siddharth Jindal






Looks like the race to build large language models is winding down, with only a few clear winners. Among them, DeepSeek V3 has claimed the spotlight in 2024, leading the charge for Chinese open-source models. Competing head-to-head with closed-source giants like GPT-4 and Claude 3.5, DeepSeek V3 notched 45,499 downloads last month, standing tall alongside Meta’s Llama 3.1 (491,629 downloads) and Google’s Gemma 2 (377,651 downloads), according to Hugging Face.

But not all LLMs launched this year could ride the wave of success—some fell flat, failing to capture interest despite grand promises. Here’s a look at the models that couldn’t make their mark in 2024.

LLMs1-1005x1300.jpg

1.​


Databricks launched DBRX, an open-source LLM with 132 billion parameters, in March 2024. It uses a fine-grained MoE architecture that activates four of 16 experts per input, with 36 billion active parameters. The company claimed that the model outperformed closed-source counterparts like GPT-3.5 and Gemini 1.5 Pro.

However, since its launch, there has been little discussion about its adoption or whether enterprises find it suitable for building applications. The Mosaic team, acquired by Databricks in 2023 for $1.3 billion, led its development, and the company spent $10 million to build DBRX. But sadly, the model saw an abysmal 23 downloads on Hugging Face last month.

2.​


In May, the Technology Innovation Institute (TII), Abu Dhabi, released its next series of Falcon language models in two variants: Falcon-2-11B and Falcon-2-11B-VLM. The Falcon 2 models showed impressive benchmark performance, with Falcon-2-11B outperforming Meta’s Llama 3 8B and matching Google’s Gemma 7B, as independently verified by the Hugging Face leaderboard.

However, later in the year, Meta released Llama 3.2 and Llama 3.3, leaving Falcon 2 behind. According to Hugging Face, Falcon-2-11B-VLM recorded just around 1,000 downloads last month.

3.​


In April, Snowflake launched Arctic LLM, a model with 480B parameters and a dense MoE hybrid Transformer architecture using 128 experts. The company proudly stated that it spent just $2 million to train the model, outperforming DBRX in tasks like SQL generation.

The company’s attention on DBRX suggested an effort to challenge Databricks. Meanwhile, Snowflake acknowledged that models like Llama 3 outperformed it on some benchmarks.

4.​


Stability AI launched the Stable LM 2 series in January last year, featuring two variants: Stable LM 2 1.6B and Stable LM 2 12B. The 1.6B model, trained on 2 trillion tokens, supports seven languages, including Spanish, German, Italian, French, and Portuguese, and outperforms models like Microsoft’s Phi-1.5 and TinyLlama 1.1B in most tasks.

Stable LM 2 12B, launched in May, offers 12 billion parameters and is trained on 2 trillion tokens in seven languages. The company claimed that the model competes with larger ones like Mixtral, Llama 2, and Qwen 1.5, excelling in tool usage for RAG systems. However, the latest user statistics tell a different story, with just 444 downloads last month.

5.​

Nemotron-4-340B-Instruct is an LLM developed by NVIDIA for synthetic data generation and chat applications. Released in June 2024, it is part of the Nemotron-4 340B series, which also includes the Base and Reward variants. Despite its features, the model has seen minimal uptake, recording just around 101 downloads on Hugging Face in December, 2024.

6.​


AI21 Labs introduced Jamba in March 2024, an LLM that combines Mamba-based structured state space models (SSM) with traditional Transformer layers. The Jamba family includes multiple versions, such as Jamba-v0.1, Jamba 1.5 Mini, and Jamba 1.5 Large.

With its 256K token context window, Jamba can process much larger chunks of text than many competing models, sparking initial excitement. However, the model failed to capture much attention, garnering only around 7K downloads on Hugging Face last month.

7.​


AMD entered the open-source AI arena in late 2024 with its OLMo series of Transformer-based, decoder-only language models. The OLMo series includes the base OLMo 1B, OLMo 1B SFT (Supervised Fine-Tuned), and OLMo 1B SFT DPO (aligned with human preferences via Direct Preference Optimisation).

Trained on 16 AMD Instinct MI250 GPU-powered nodes, the models achieved a throughput of 12,200 tokens/sec/gpu.

The flagship OLMo 1B model features 1.2 billion parameters, 16 layers, 16 heads, a hidden size of 2048, a context length of 2048 tokens, and a vocabulary size of 50,280, targeting developers, data scientists, and businesses. Despite this, the model failed to gain any traction in the community.
 
Top