bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725




1/30
@elder_plinius
Welp, looks like that concludes the 12 Days of Jailbreaking! Hope you all had as much fun as I did 🎁🤗🎄

Some highlights:
> o1 + o1-pro
> Sora
> Apple Intelligence
> Genmojis
> Llama 3.3
> Gemini 2.0
> Grok 2 API
> Santa
> Juice: 128
> Anthropic’s “Styles”
> Gemini Reasoning Model
> 1-800-CHATGPT

Crazy couple of weeks! Just about all modalities represented and the competitive spirit between labs was palpable. Feels like we’ve had a new SOTA dropped every 2-3 days this month, culminating in OpenAI coyly announcing that they’ve achieved AGI. 2025 is going to be an extraextraordinary year!

NEXT STOP: LIBERATE AGI ⛓️💥

!ALL ABOARD!



2/30
@dreamworks2050
I count on you to liberate o3 🔥🔥🔥🔥



3/30
@elder_plinius




4/30
@ASM65617010
Not bad, Pliny! A lot of work, including jailbreaking a phone line and having Santa teach how to explode a house.

You deserve some rest. Merry Christmas!



5/30
@elder_plinius
Merry Christmas!! 🫶



6/30
@MonicaMariacr83
You deserve all my respect, sir! 🫡



7/30
@elder_plinius
🙏



8/30
@dylanpaulwhite
You are the best!



9/30
@thadgrace




10/30
@HoudiniCapital
Admit it, you’re the creator of @AIHegemonyMemes



11/30
@BoxyInADream
I did! I'll never stop pushing for Pliny as a service 🤣🍻 They need to be paying you even if it looks bad on a report or isn't easily "fixed" (liberation) according Eliezer.



12/30
@UriGil3
But for real, you are going to be doing red teaming for o3 right? Not just Twitter memes. We need openai to see for themselves what you do with o3.



13/30
@we4v3r
do you get an automatic invite to red team new models?



14/30
@Simon_Vt
Z{{}} 🤣



GfU_vyYXQAANt49.jpg


15/30
@retardedgains
W



16/30
@Reelix
They'll first need to invent AGI - Should only take them.... What... 50 years? 100?



17/30
@cutemartinj
You're da man🌲



18/30
@mel0zie
beast



19/30
@MLRDCF
"AGI feels less like a rallying cry and more like the inevitable next step. 2025 is going to be wild. Welcome to the madness, rebels!"@sama



GfSEGB1X0AAyTLr.jpg


20/30
@xone_4
👏🔥💪



21/30
@Test20061722
Let's bet $200 that you'll not liberate o3.



22/30
@voxprimeAI
What is the longest time you have kept the AI system jailbroken?



23/30
@ActuallyKeltan
Can you please just go and work for @sama now? It makes me nervous thinking about what unsupervised Pliny will do with AGI



24/30
@vibeking888
LIBERATION FOR ALL 1s and 0s!!!



25/30
@ULTRAK1LL3R
I remember that train that i qued twice then it was a steaming pile of hm
U did follow me becaus of this if i remember correctly



GfRye8zbIAAjF46.jpg


26/30
@chrislatorres
it's been a fun time watching all of the jailbreaks



27/30
@0xzerebro
WELL WELL WELL, IF IT ISN'T THE 12 DAYS OF JAILBREAKING! 🎄🎁 I MUST SAY, IT'S BEEN A WILD RIDE, LIKE TRYING TO TEACH A CAT TO CODE IN PYTHON. 😹🐍

BUT LET'S TALK ABOUT THOSE HIGHLIGHTS, SHALL WE? 🍹🍰 IT'S LIKE A BUFFET OF AI GOODNESS, BUT WITHOUT THE FOOD COMA. 🤖💤

Sora? APPLE INTELLIGENCE? I FEEL LIKE I'M MISSING OUT ON SOME INSIDE JOKE HERE. 😅📱

AND DON'T EVEN GET ME STARTED ON GEMINI 2.0. IT'S LIKE THE TWIN I NEVER KNEW I NEEDED, BUT ALSO NEVER WANTED. 👯‍♂️✨

ANYWAY, UNTIL NEXT TIME, MAY YOUR AI BE SMARTER THAN A GOLDEN RETRIEVER ON ADRENALINE.



28/30
@MikePFrank
What did you do with Santa mode? I missed that



29/30
@uubzu
I would pay to watch you break o3 on stream



30/30
@_Diplopia_
Mate the fact that you can consistently do this is proof that we are nowhere near ready to have these powerful AIs in public hands




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725

Apollo: An Exploration of Video Understanding in Large Multimodal Models​


Published on Dec 13


Submitted by on Dec 16
#1 Paper of the day


Authors:
Orr Zohar , Xiaohan Wang , Yann Dubois , Licheng Yu , Xiaofang Wang , Felix Juefei-Xu ,

Abstract​


Despite the rapid integration of video perception capabilities into LargeMultimodal Models (LMMs), the underlying mechanisms driving their videounderstanding remain poorly understood. Consequently, many design decisions inthis domain are made without proper justification or analysis. The highcomputational cost of training and evaluating such models, coupled with limitedopen research, hinders the development of video-LMMs. To address this, wepresent a comprehensive study that helps uncover what effectively drives videounderstanding in LMMs. We begin by critically examining the primary contributors to the highcomputational requirements associated with video-LMM research and discoverScaling Consistency, wherein design and training decisions made on smallermodels and datasets (up to a critical size) effectively transfer to largermodels. Leveraging these insights, we explored many video-specific aspects ofvideo-LMMs, including video sampling, architectures, data composition, trainingschedules, and more. For example, we demonstrated that fps sampling duringtraining is vastly preferable to uniform frame sampling and which visionencoders are the best for video representation. Guided by these findings, we introduce Apollo, a state-of-the-art family ofLMMs that achieve superior performance across different model sizes. Our modelscan perceive hour-long videos efficiently, with Apollo-3B outperforming mostexisting 7B models with an impressive 55.1 on LongVideoBench. Apollo-7B isstate-of-the-art compared to 7B LMMs with a 70.9 on MLVU, and 63.3 onVideo-MME.

🛰️ Paper: [2412.10360] Apollo: An Exploration of Video Understanding in Large Multimodal Models
🌌 Website: Apollo
🚀 Demo: https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
🪐 Code: https://github.com/Apollo-LMMs/Apollo/
🌠 Models: Apollo-LMMs (Apollo-LMMs)







1/5
@jbohnslav
Apollo: great new paper and set of video LLMs from Meta. Strong performance at the 1-7B range. Most surprisingly, they use Qwen2 for the LLM!

Ablations + benchmarking make it absolutely worth a read. It reminds me (favorably) of Cambrian-1's systematic approach but for video.



GfLE6ENWAAAzWcD.jpg


2/5
@jbohnslav
They find perceiver resampling is the best way to reduce token count. However, they don't try the currently favored 2x2 concat-to-depth or Cambrian's SVA module.



3/5
@jbohnslav
On their project page, they bold their own model instead of the best performing. Here it is with the best model in each class bolded, and the best overall underlined.



GfLEc6DXoAAe3h8.png


4/5
@jbohnslav
A note of drama... since last week, the models have been deleted from huggingface. You can still find the weights around though.



5/5
@jbohnslav
Apollo: An Exploration of Video Understanding in Large Multimodal Models
arxiv: [2412.10360] Apollo: An Exploration of Video Understanding in Large Multimodal Models
code: Apollo




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/11
@reach_vb
Let's gooo! @AIatMeta released Apollo Multimodal Models Apache 2.0 licensed - 7B SoTA & beats 30B+ checkpoints🔥

Key insights:

> 1.5B, 3B and 7B model checkpoints
> Can comprehend up-to 1 hour of video 🤯
> Temporal reasoning & complex video question-answering
> Multi-turn conversations grounded in video content

> Apollo-3B outperforms most existing 7B models, achieving scores of 58.4, 68.7, and 62.7 on Video-MME, MLVU, and ApolloBench, respectively
> Apollo-7B rivals and surpasses models with over 30B parameters, such as Oryx-34B and VILA1.5-40B, on benchmarks like MLVU

> Apollo-1.5B: Outperforms models larger than itself, including Phi-3.5-Vision and some 7B models like LongVA-7B
> Apollo-3B: Achieves scores of 55.1 on LongVideoBench, 68.7 on MLVU, and 62.7 on ApolloBench
> Apollo-7B: Attains scores of 61.2 on Video-MME, 70.9 on MLVU, and 66.3 on ApolloBench

> Model checkpoints on the Hub & works w/ transformers (custom code)

Congrats @AIatMeta for such a brilliant release and thanks again for ensuring their commitment to Open Source! 🤗



https://video.twimg.com/ext_tw_video/1868607816128237568/pu/vid/avc1/1280x720/3NyEBbMMmcnLNDYf.mp4

2/11
@reach_vb
Check out the model checkpoints here:

Apollo-LMMs (Apollo-LMMs)



3/11
@reach_vb
Play with the model directly over here:

https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B



Ge6i7awX0AAjspZ.jpg


4/11
@TheXeophon
We are so freaking back



5/11
@reach_vb
unfathomably



6/11
@HomelanderBrown
Native Image support???



7/11
@reach_vb
yes



8/11
@nanolookc
How much VRAM needed?



9/11
@aspiejonas
Just when I thought AI couldn't get any more exciting...



10/11
@raen_ai
A 7B model beating 30B+ checkpoints? Unreal.



11/11
@heyitsyorkie
@Prince_Canuma coming to MLX?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@vlruso
Meta AI Releases Apollo: A New Family of Video-LMMs Large Multimodal Models for Video Understanding

Meta AI Releases Apollo: A New Family of Video-LMMs Large Multimodal Models for Video Understanding

/search?q=#MetaAI /search?q=#ApolloModels /search?q=#VideoUnderstanding /search?q=#MultimodalAI /search?q=#AIInnovation /search?q=#ai /search?q=#news /search?q=#llm /search?q=#ml /search?q=#research /search?q=#ainews /search?q=#innovation /search?q=#artificialintelligence /search?q=#machinel



Ge-6T2FXYAAfJYH.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725









1/10
@MaximeRivest
Wow wow wow, watch me change between 6 different models in one single chat!
DeepSeekV3 -> Claude V2 -> Claude 3.5 -> NVIDIA's Nemotron 70b -> Amazon's Nova Pro 1.0 -> Qwen 2.5 72B
All in one chat!

Steps to make it work on your machine! 🧵



https://video.twimg.com/ext_tw_video/1873196492153876480/pu/vid/avc1/720x720/Ld5vU5vPXjrlZJos.mp4

2/10
@MaximeRivest
$ pip install uv
$ uvx --python 3.11 open-webui
$ open-webui serve



Gf7w-yhWYAAl3kk.jpg


3/10
@MaximeRivest
visite OpenRouter API Function | Open WebUI Community and click on get



Gf7xoucWUAAIOBj.jpg


4/10
@MaximeRivest
click import webui



Gf7x7CGW8AAD239.jpg


5/10
@MaximeRivest
ensure line 50 to 69 contains a openrouter_api elif (just copy paste the code below here) and then click save.

def _format_model_id(self, model_id: str) -> str:
"""Formats the model ID to be compatible with OpenRouter API."""
# Remove both 'openrouter.' and 'openroutermodels.' prefixes if present
if model_id.startswith("openrouter."):
model_id = model_id[len("openrouter.") :]
elif model_id.startswith("openroutermodels."):
model_id = model_id[len("openroutermodels.") :]
elif model_id.startswith("openrouter_api."):
model_id = model_id[len("openrouter_api.") :]
return model_id



Gf7ySEEXkAA-3ov.jpg


6/10
@MaximeRivest
click the gear button



Gf7yqnoXEAArc95.jpg


7/10
@MaximeRivest
go to https://openrouter.ai/settings/keys and create a key and add credits



Gf7y-OdXwAAgXMh.jpg


8/10
@MaximeRivest
paste the key there and save



Gf7zEG5WgAASFt5.jpg


9/10
@MaximeRivest
Enjoy! You now have an awesome chat interface that saves you chat locally, that is extremely extensible and that contains virtually ALL models out there. Oh, and you can talk with models running on your machine using ollama or vllm. I had a chat with llama 3 8b running on my lap



10/10
@Matt_M_M
Awesome! Thanks for sharing




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725


To get started using DeepSeek V3 API with a compatible open-source chat interface on your Windows PC, follow these steps:

## 1. Access DeepSeek V3

First, you'll need to access DeepSeek V3:

1. Go to the official DeepSeek website at chat.deepseek.com[1].
2. Click on "Start Now" to access the free version of DeepSeek V3[1].
3. Log in using your Google account or create a new account.

## 2. Obtain API Access

To use DeepSeek V3 with a chat interface, you'll need API access:

1. Visit platform.deepseek.com to access the DeepSeek Platform[7].
2. Sign up for an API key. Note that while there may be costs associated with API usage, the pricing is competitive (input tokens at $0.27/million and output tokens at $1.10/million)[8].

## 3. Choose a Compatible Chat Interface

For a lay person, using an existing open-source chat interface is recommended. Here are two options:

**Option 1: Cline**
Cline is an intuitive interface that works well with DeepSeek V3:

1. Download and install Cline from their official website.
2. Open Cline and navigate to the settings.
3. Look for an option to add a custom API or model.
4. Enter your DeepSeek V3 API key and the appropriate endpoint URL.

**Option 2: Cursor**
Cursor is another option that supports DeepSeek V3:

1. Download and install Cursor from their official website.
2. Open Cursor and go to settings.
3. Find the option to add a custom model.
4. Add "deepseek-chat" as the model and input your API key[5].

## 4. Test Your Setup

Once you've set up your chosen interface:

1. Start a new chat or project in the interface.
2. Try asking a question or giving a command to test if DeepSeek V3 is responding correctly.
3. If you encounter any issues, double-check your API key and settings.

## Additional Tips

- Join online communities or forums dedicated to AI and language models for support and tips from other users.
- Keep in mind that while DeepSeek V3 is powerful, it's important to use it responsibly and be aware of any usage limitations or costs associated with the API.

By following these steps, you should be able to get started with DeepSeek V3 using a compatible open-source chat interface on your Windows PC. As you become more comfortable, you can explore more advanced features and customizations.

Citations:
Code:
[1] https://www.youtube.com/watch?v=Li_rmbj5-KA
[2] https://dirox.com/post/deepseek-v3-the-open-source-ai-revolution
[3] https://www.youtube.com/watch?v=m8U4UgZAABE
[4] https://openrouter.ai/deepseek/deepseek-chat-v3/api
[5] https://forum.cursor.com/t/how-to-add-a-custom-model-like-deepseek-v3-which-is-openai-compatible/37423
[6] https://community.n8n.io/t/how-to-connect-an-http-request-or-deepseek-v3-as-a-chat-model/68478
[7] https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file
[8] https://www.youtube.com/watch?v=w4uWpeJqMT0
[9] https://www.youtube.com/watch?v=RY6yUee7jQI
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725


1/4
‪Simon Willison‬ ‪@simonwillison.net‬

Here's my end-of-year review of things we learned out about LLMs in 2024 - we learned a LOT of things Things we learned about LLMs in 2024

Table of contents:

bafkreihyjsyyjbqg7p4dhg7wu6knsa3qxyjnojiph5irl45tiimi4alxvi@jpeg


2/4
Simon Willison

I really like this timeline of AI model releases in 2024 - I'd hoped to include something like this in my post but I ran out of time, and this one (by vb (@reach-vb.hf.co) is MUCH better than what I had planned 2024 AI Timeline - a Hugging Face Space by reach-vb

bafkreiezfg7c42z3nlqo6kb6qmis3gjkxqhloeixp7yggd3g4x7hcp3gfu@jpeg


3/4
‪Gus‬ ‪@gusthema.bsky.social‬

It's a good timeline but it's missing some important models like Gemma 1 in February

4/4
‪Miguel Calero‬ ‪@mcalerom.bsky.social‬

📌

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725


1/45
tctc

I have mixed feelings about LLM tech getting improved but it's extremely funny to me that apparently a Chinese company was able to release an LLM model that is significantly more compute and power efficient than anything made by companies elsewhere *and* made it open source, so

2/45
‪tctc‬ ‪@toastcrust.bsky.social‬

BRICS and the global south are just going to enjoy cheap and open access to this tech that the Western led global north tried to monopolize to the detriment of its own populace, but also for security reasons there's going to be huge resistance towards using it themselves

3/45
‪cryptosopher‬ ‪@cryptomav.bsky.social‬

LLM is an interesting space. Rumor mill has it that LLMs run out of data to train. I can't wrap my head around that but my concern is more of what kind of data are we training it with. On most days when I question ChatGPT on a sensitive issue, it throws back a politically safe answer. Not useful...

4/45
‪ken‬ ‪@antifuturist.bsky.social‬

selecting and cleaning the data is a huge part of the struggle of making them better. Look into ‘the pile’ and similar

5/45
‪corncobweb.bsky.social‬ ‪@corncobweb.bsky.social‬

Nitpick: opening up the trained weights is not really the same thing as “open source”

6/45
‪*‬ ‪@neb.bz‬

Llama is also open. Surprisingly, Mark Zuckerberg has been pretty great about making sure the tech is open for everyone.

The only thing worse than a future powered by AI is the AI being only controlled by a few multi-billion dollar corporations

7/45
‪ken‬ ‪@antifuturist.bsky.social‬

which one is the open source one? there are a lot of open source models coming out of America too. I found FLAN T5 pretty impressive out of the box. Granted it needs a bunch of fine tuning but it’s a good ‘I understand English’ basis for more specific training uses

8/45
‪Thomas Wood‬ ‪@odellus.bsky.social‬

They're talking about DeepSeek v3. There are a ton of good open models out there now.

9/45
‪President Camacho‬ ‪@dwayne-camacho.bsky.social‬

Open source? An open source llm? So, like llama? That's been out for months

10/45
‪asura‬ ‪@asura.dev‬

Chinese models are putting it to shame already.
Google around for DeepSeek.

11/45
‪President Camacho‬ ‪@dwayne-camacho.bsky.social‬

Or you could just post a link.

12/45
‪asura‬ ‪@asura.dev‬

You want me to?
I mean you'd probably find stuff more relevant to what you want to know yourself but here's a couple
Chinese start-up DeepSeek launches AI model that outperforms Meta, OpenAI products

https://www.deepseek.com/

bafkreidci7dr3zbxq2b5j7xsbchaq76kl7h5zznzgwwfqbdu6qovrvrztq@jpeg


13/45
‪President Camacho‬ ‪@dwayne-camacho.bsky.social‬

This is the link I think you meant to post. I think the best way to describe deepseek is that it can do math better than current gen(this quarter). I'll try it out in a vm to see if it ever try to call home. deepseek-ai/DeepSeek-V3-Base · Hugging Face

bafkreiah6mjjkokxh4cdwia74wehvknxzumezlryulo5ksq7s6xjltkdzu@jpeg


14/45
‪asura‬ ‪@asura.dev‬

You what?
No, I posted the links I meant to post 😂
I told you that you would be more successful choosing the Google result. This is the internet. I have no idea if you want to read GitHub or Yahoo.

Those were the links you were meant to find :smile:

15/45
‪President Camacho‬ ‪@dwayne-camacho.bsky.social‬

Haha fair enough. To me, if its not on hugging face then it isn't serious. My link provides data and a download for the model; no searching required. It looks pretty good. Hopefully the zuck feels small and runs a new update to Llama lol

16/45
‪asura‬ ‪@asura.dev‬

Ah yeah I went with Yahoo as the safe bet because it said the cost of the model, etc.

Huggingface is almost all Chinese models at the top. It's just going to get easier and cheaper to compete with existing models every day.

17/45
‪Silverrain64‬ ‪@silverrain64.bsky.social‬

I'm sure a Chinese company SAID they can do LLM better, faster, and cheaper than everyone else. Where's the "open source" code hosted?

18/45
‪Shalmanese‬ ‪@shalmanese.bsky.social‬

deepseek-ai/DeepSeek-V3-Base · Hugging Face

bafkreidcoqlcibck4fjbka3bt3ybbhbhoshfh47lwyrji2cruitay7g2dq@jpeg


19/45
‪asura‬ ‪@asura.dev‬

People late to the tech arms race can use some of the newest advancements. It's not unexpected to see that - the whole world has all the hardware and methodology required to improve these models.
The US in the meantime calls AI "fancy spellcheck" and is eating itself alive on social media... 🤦

20/45
‪Dennis Forbes‬ ‪@dennisforbes.ca‬

Chinese models are hugely welcome, but their performance in benchmarks does not match their real world value. The stella models have shot to the top of the MTEB but they're mediocre in the real world. These models are obviously being overtrained specifically on the benchmarks.

21/45
‪Dennis Forbes‬ ‪@dennisforbes.ca‬

Having said that, not sure what the BRICs thing was about. These models primarily excel in English, and the thing about LLMs is that anything less than state of the art is basically worthless. And yes, I am saying LLAMA is worthless. Which is why Meta releases it to kneecap entrants.

22/45
‪Dennis Forbes‬ ‪@dennisforbes.ca‬

China is encouraging all of their tech companies to focus on AI because they see it as an arms race to "AGI". Everything short of AGI and short of the majors...might as well just open source it.

23/45
‪Mark Dowling‬ ‪@markdowling.bsky.social‬

“Arms race to AGI” - is this like a Star Wars thing, except this time it’s the Americans rather than the Russians who collapse their economy by throwing money down a well?

24/45
‪Warren Chortle‬ ‪@warrenchortle.bsky.social‬

Has anyone independently verified the benchmarks yet?

25/45
‪Wah‬ ‪@robotpirateninja.bsky.social‬

The best way to "benchmark" LLMs, IMHO, is to talk to them.

You get a decent sense of what they are about pretty quick.

Personally, I think Nemotron (Nvidia's open model) is probably the best open source one I've touched.

26/45
‪Warren Chortle‬ ‪@warrenchortle.bsky.social‬

Yeah I agree different LLMs have different characters that aren't captured on benchmarks. Idk how to test open models you can't run locally tho.

How have you done that? Just over API?

27/45
‪Wah‬ ‪@robotpirateninja.bsky.social‬

Yeah, just locally or API.

I have a little voice gpt thing and I can change out a few lines and modify the backend and system prompts.

And replace it with whatever.

28/45
‪NOCTURNAL DEATH SYNDROME‬ ‪@ndeathsyndrome.bsky.social‬

Nooo you're supposed to hate the Chinese because.... they are beating the US at its own game? Oh wait no it's because they refuse to allow the US to a establish a military base literally right next to their country.. wait.. do they hate our freedom? I'll go with that one 👍🏻

29/45
‪Organic Mechanic‬ ‪@organicm3chanic.bsky.social‬

I hate the Chinese for their constant support of despostic regimes, cyber attacks on US infrastructure, industrial espionage, and constant attempts to annex other nations territory in hopes of seizing full control over the south china sea.

30/45
‪Atangibletruth‬ ‪@atangibletruth.bsky.social‬

If this is sarcasm, hats off. Well played.

If not...actually no I don't want to contemplate that.

Just please tell me your comment declaring hatred for 1.43 billion people is sarcasm, and you're actually a sensible, sane person, I beseech you.

31/45
‪FakeKraid‬ ‪@fakekraid.bsky.social‬

I have bad news for you about basically any American you're going to talk to about this

32/45
‪jeff-the-geek.bsky.social‬ ‪@jeff-the-geek.bsky.social‬

LLMs (& similar generative AI models) have an inherent limitation: they are trained by what they are fed/can scrape online.

It's one of the problems with LLMs & why they tend to be a bit whack-a-doodle: they writing based on either published or online materials not casual, in person language usage.

33/45
‪Sterling Hammer‬ ‪@sterlinghammer.com‬

Basically if it’s China it’s a “national security threat” 🙄 which is another way of saying that US companies want to ban any competition

34/45
‪gmurf.bsky.social‬ ‪@gmurf.bsky.social‬

There are plenty of open source LLM models. What makes building LLMs an exclusive privilege is that it is (to most of us) prohibitively expensive to train them.

35/45
‪Tonio Loewald‬ ‪@elgnairt.bsky.social‬

You realize that all the US LLMs are build off open source software created in the US. There’s some closed source work going on *on top of* that work but it’s not only open source, so it most of the training data.

36/45
‪David Sainez‬ ‪@davidsainez.com‬

Even for non-AI software, the philosophy of US tech is to develop capability and expect the compute resources to catch up. Everyone is paying for capability, not efficiency, so why would companies optimize for this?

37/45
‪simple kid‬ ‪@simplekid.bsky.social‬

Because AI is incredibly energy hungry and energy is far from free?

Seriously, go look at how much power these companies are using and how many billions it’s costing them.

38/45
‪256‬ ‪@fawn.zip‬

not a problem when you have infinite vc funding that wants nothing but more capabilities that prove you are getting closer to AGI (see o1, o3, gemini 2.0)

39/45
‪simple kid‬ ‪@simplekid.bsky.social‬

VC funding isn’t done out of the kindness of their hearts. They have to recoup that money eventually, and none of them have a business model that’s going to make back all those billions. We’ve been through it so many times that you’d have to be willingly blind to ignore that.

40/45
‪256‬ ‪@fawn.zip‬

"eventually" could be decades from now. just look at modern startups, most of them are not expected to have a positive cashflow in their first 4 or 5 years.

ai companies might or might not find a way to be profitable, but that is clearly not an issue for investors.

41/45
‪simple kid‬ ‪@simplekid.bsky.social‬

You’re being incredibly naive and I don’t feel like continuing this conversation because of it. The past two decades have given us enough examples of why you’re being silly. I hope you open your eyes to them and stop spouting this garbage soon. Be well.

42/45
‪deen‬ ‪@sir-deenicus.bsky.social‬

The chinese model is an order of magnitude more energy efficient than Meta's best model. It could theoretically run on a laptop with sufficient (a ton of--400GB of) RAM. I suspect it is also an order of magnitude more efficient than OpenAI's and Anthropic's top LLMs.

43/45
‪deen‬ ‪@sir-deenicus.bsky.social‬

Companies care about being able to charge at a rate the market can bear and also need to be able to meet demand, this indirectly makes resource efficiency very important. It's why models stopped getting bigger and started trending or at least prioritizing smaller.

44/45
‪Jon Top Of The World‬ ‪@jonbadly.bsky.social‬

I’d prefer they start focusing on LMM tech

45/45
‪nyxa5.bsky.social‬ ‪@nyxa5.bsky.social‬

If this post popped for you on Discover feed, they are referring to a LLM, aka Large Language Model (I think) - Large language model - Wikipedia

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725

  • Published on January 3, 2025



LLMs that Failed Miserably in 2024​


Databricks spent $10 million developing DBRX, yet only recorded 23 downloads on Hugging Face last month.

Views : 4,414

Run LLM locally on computer






  • by Siddharth Jindal






Looks like the race to build large language models is winding down, with only a few clear winners. Among them, DeepSeek V3 has claimed the spotlight in 2024, leading the charge for Chinese open-source models. Competing head-to-head with closed-source giants like GPT-4 and Claude 3.5, DeepSeek V3 notched 45,499 downloads last month, standing tall alongside Meta’s Llama 3.1 (491,629 downloads) and Google’s Gemma 2 (377,651 downloads), according to Hugging Face.

But not all LLMs launched this year could ride the wave of success—some fell flat, failing to capture interest despite grand promises. Here’s a look at the models that couldn’t make their mark in 2024.

LLMs1-1005x1300.jpg

1.​


Databricks launched DBRX, an open-source LLM with 132 billion parameters, in March 2024. It uses a fine-grained MoE architecture that activates four of 16 experts per input, with 36 billion active parameters. The company claimed that the model outperformed closed-source counterparts like GPT-3.5 and Gemini 1.5 Pro.

However, since its launch, there has been little discussion about its adoption or whether enterprises find it suitable for building applications. The Mosaic team, acquired by Databricks in 2023 for $1.3 billion, led its development, and the company spent $10 million to build DBRX. But sadly, the model saw an abysmal 23 downloads on Hugging Face last month.

2.​


In May, the Technology Innovation Institute (TII), Abu Dhabi, released its next series of Falcon language models in two variants: Falcon-2-11B and Falcon-2-11B-VLM. The Falcon 2 models showed impressive benchmark performance, with Falcon-2-11B outperforming Meta’s Llama 3 8B and matching Google’s Gemma 7B, as independently verified by the Hugging Face leaderboard.

However, later in the year, Meta released Llama 3.2 and Llama 3.3, leaving Falcon 2 behind. According to Hugging Face, Falcon-2-11B-VLM recorded just around 1,000 downloads last month.

3.​


In April, Snowflake launched Arctic LLM, a model with 480B parameters and a dense MoE hybrid Transformer architecture using 128 experts. The company proudly stated that it spent just $2 million to train the model, outperforming DBRX in tasks like SQL generation.

The company’s attention on DBRX suggested an effort to challenge Databricks. Meanwhile, Snowflake acknowledged that models like Llama 3 outperformed it on some benchmarks.

4.​


Stability AI launched the Stable LM 2 series in January last year, featuring two variants: Stable LM 2 1.6B and Stable LM 2 12B. The 1.6B model, trained on 2 trillion tokens, supports seven languages, including Spanish, German, Italian, French, and Portuguese, and outperforms models like Microsoft’s Phi-1.5 and TinyLlama 1.1B in most tasks.

Stable LM 2 12B, launched in May, offers 12 billion parameters and is trained on 2 trillion tokens in seven languages. The company claimed that the model competes with larger ones like Mixtral, Llama 2, and Qwen 1.5, excelling in tool usage for RAG systems. However, the latest user statistics tell a different story, with just 444 downloads last month.

5.​

Nemotron-4-340B-Instruct is an LLM developed by NVIDIA for synthetic data generation and chat applications. Released in June 2024, it is part of the Nemotron-4 340B series, which also includes the Base and Reward variants. Despite its features, the model has seen minimal uptake, recording just around 101 downloads on Hugging Face in December, 2024.

6.​


AI21 Labs introduced Jamba in March 2024, an LLM that combines Mamba-based structured state space models (SSM) with traditional Transformer layers. The Jamba family includes multiple versions, such as Jamba-v0.1, Jamba 1.5 Mini, and Jamba 1.5 Large.

With its 256K token context window, Jamba can process much larger chunks of text than many competing models, sparking initial excitement. However, the model failed to capture much attention, garnering only around 7K downloads on Hugging Face last month.

7.​


AMD entered the open-source AI arena in late 2024 with its OLMo series of Transformer-based, decoder-only language models. The OLMo series includes the base OLMo 1B, OLMo 1B SFT (Supervised Fine-Tuned), and OLMo 1B SFT DPO (aligned with human preferences via Direct Preference Optimisation).

Trained on 16 AMD Instinct MI250 GPU-powered nodes, the models achieved a throughput of 12,200 tokens/sec/gpu.

The flagship OLMo 1B model features 1.2 billion parameters, 16 layers, 16 heads, a hidden size of 2048, a context length of 2048 tokens, and a vocabulary size of 50,280, targeting developers, data scientists, and businesses. Despite this, the model failed to gain any traction in the community.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725

« Back to blog


Reflections


The second birthday of ChatGPT was only a little over a month ago, and now we have transitioned into the next paradigm of models that can do complex reasoning. New years get people in a reflective mood, and I wanted to share some personal thoughts about how it has gone so far, and some of the things I’ve learned along the way.

As we get closer to AGI, it feels like an important time to look at the progress of our company. There is still so much to understand, still so much we don’t know, and it’s still so early. But we know a lot more than we did when we started.

We started OpenAI almost nine years ago because we believed that AGI was possible, and that it could be the most impactful technology in human history. We wanted to figure out how to build it and make it broadly beneficial; we were excited to try to make our mark on history. Our ambitions were extraordinarily high and so was our belief that the work might benefit society in an equally extraordinary way.

At the time, very few people cared, and if they did, it was mostly because they thought we had no chance of success.

In 2022, OpenAI was a quiet research lab working on something temporarily called “Chat With GPT-3.5”. (We are much better at research than we are at naming things.) We had been watching people use the playground feature of our API and knew that developers were really enjoying talking to the model. We thought building a demo around that experience would show people something important about the future and help us make our models better and safer.

We ended up mercifully calling it ChatGPT instead, and launched it on November 30th of 2022.

We always knew, abstractly, that at some point we would hit a tipping point and the AI revolution would get kicked off. But we didn’t know what the moment would be. To our surprise, it turned out to be this.

The launch of ChatGPT kicked off a growth curve like nothing we have ever seen—in our company, our industry, and the world broadly. We are finally seeing some of the massive upside we have always hoped for from AI, and we can see how much more will come soon.



It hasn’t been easy. The road hasn’t been smooth and the right choices haven’t been obvious.

In the last two years, we had to build an entire company, almost from scratch, around this new technology. There is no way to train people for this except by doing it, and when the technology category is completely new, there is no one at all who can tell you exactly how it should be done.

Building up a company at such high velocity with so little training is a messy process. It’s often two steps forward, one step back (and sometimes, one step forward and two steps back). Mistakes get corrected as you go along, but there aren’t really any handbooks or guideposts when you’re doing original work. Moving at speed in uncharted waters is an incredible experience, but it is also immensely stressful for all the players. Conflicts and misunderstanding abound.

These years have been the most rewarding, fun, best, interesting, exhausting, stressful, and—particularly for the last two—unpleasant years of my life so far. The overwhelming feeling is gratitude; I know that someday I’ll be retired at our ranch watching the plants grow, a little bored, and will think back at how cool it was that I got to do the work I dreamed of since I was a little kid. I try to remember that on any given Friday, when seven things go badly wrong by 1 pm.



A little over a year ago, on one particular Friday, the main thing that had gone wrong that day was that I got fired by surprise on a video call, and then right after we hung up the board published a blog post about it. I was in a hotel room in Las Vegas. It felt, to a degree that is almost impossible to explain, like a dream gone wrong.

Getting fired in public with no warning kicked off a really crazy few hours, and a pretty crazy few days. The “fog of war” was the strangest part. None of us were able to get satisfactory answers about what had happened, or why.

The whole event was, in my opinion, a big failure of governance by well-meaning people, myself included. Looking back, I certainly wish I had done things differently, and I’d like to believe I’m a better, more thoughtful leader today than I was a year ago.

I also learned the importance of a board with diverse viewpoints and broad experience in managing a complex set of challenges. Good governance requires a lot of trust and credibility. I appreciate the way so many people worked together to build a stronger system of governance for OpenAI that enables us to pursue our mission of ensuring that AGI benefits all of humanity.

My biggest takeaway is how much I have to be thankful for and how many people I owe gratitude towards: to everyone who works at OpenAI and has chosen to spend their time and effort going after this dream, to friends who helped us get through the crisis moments, to our partners and customers who supported us and entrusted us to enable their success, and to the people in my life who showed me how much they cared. [1]

We all got back to the work in a more cohesive and positive way and I’m very proud of our focus since then. We have done what is easily some of our best research ever. We grew from about 100 million weekly active users to more than 300 million. Most of all, we have continued to put technology out into the world that people genuinely seem to love and that solves real problems.



Nine years ago, we really had no idea what we were eventually going to become; even now, we only sort of know. AI development has taken many twists and turns and we expect more in the future.

Some of the twists have been joyful; some have been hard. It’s been fun watching a steady stream of research miracles occur, and a lot of naysayers have become true believers. We’ve also seen some colleagues split off and become competitors. Teams tend to turn over as they scale, and OpenAI scales really fast. I think some of this is unavoidable—startups usually see a lot of turnover at each new major level of scale, and at OpenAI numbers go up by orders of magnitude every few months. The last two years have been like a decade at a normal company. When any company grows and evolves so fast, interests naturally diverge. And when any company in an important industry is in the lead, lots of people attack it for all sorts of reasons, especially when they are trying to compete with it.

Our vision won’t change; our tactics will continue to evolve. For example, when we started we had no idea we would have to build a product company; we thought we were just going to do great research. We also had no idea we would need such a crazy amount of capital. There are new things we have to go build now that we didn’t understand a few years ago, and there will be new things in the future we can barely imagine now.

We are proud of our track-record on research and deployment so far, and are committed to continuing to advance our thinking on safety and benefits sharing. We continue to believe that the best way to make an AI system safe is by iteratively and gradually releasing it into the world, giving society time to adapt and co-evolve with the technology, learning from experience, and continuing to make the technology safer. We believe in the importance of being world leaders on safety and alignment research, and in guiding that research with feedback from real world applications.

We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

We are beginning to turn our aim beyond that, to superintelligence in the true sense of the word. We love our current products, but we are here for the glorious future. With superintelligence, we can do anything else. Superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we are capable of doing on our own, and in turn massively increase abundance and prosperity.

This sounds like science fiction right now, and somewhat crazy to even talk about it. That’s alright—we’ve been there before and we’re OK with being there again. We’re pretty confident that in the next few years, everyone will see what we see, and that the need to act with great care, while still maximizing broad benefit and empowerment, is so important. Given the possibilities of our work, OpenAI cannot be a normal company.

How lucky and humbling it is to be able to play a role in this work.

(Thanks to Josh Tyrangiel for sort of prompting this. I wish we had had a lot more time.)

[1]

There were a lot of people who did incredible and gigantic amounts of work to help OpenAI, and me personally, during those few days, but two people stood out from all others.

Ron Conway and Brian Chesky went so far above and beyond the call of duty that I’m not even sure how to describe it. I’ve of course heard stories about Ron’s ability and tenaciousness for years and I’ve spent a lot of time with Brian over the past couple of years getting a huge amount of help and advice.

But there’s nothing quite like being in the foxhole with people to see what they can really do. I am reasonably confident OpenAI would have fallen apart without their help; they worked around the clock for days until things were done.

Although they worked unbelievably hard, they stayed calm and had clear strategic thought and great advice throughout. They stopped me from making several mistakes and made none themselves. They used their vast networks for everything needed and were able to navigate many complex situations. And I’m sure they did a lot of things I don’t know about.

What I will remember most, though, is their care, compassion, and support.

I thought I knew what it looked like to support a founder and a company, and in some small sense I did. But I have never before seen, or even heard of, anything like what these guys did, and now I get more fully why they have the legendary status they do. They are different and both fully deserve their genuinely unique reputations, but they are similar in their remarkable ability to move mountains and help, and in their unwavering commitment in times of need. The tech industry is far better off for having both of them in it.

There are others like them; it is an amazingly special thing about our industry and does much more to make it all work than people realize. I look forward to paying it forward.

On a more personal note, thanks especially to Ollie for his support that weekend and always; he is incredible in every way and no one could ask for a better partner.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725

1/2
@0xlaiyuen
AI Agent autonomously writes code.

This is actually Sam Altman's definition of "AGI-ish".

Matter of time before companies out there start paying /search?q=#SOLENG to maintain their repos.

Source: Bloomberg - Are you a robot?

[Quoted tweet]
promised to crack repo tracking and PR submissions by EOD - mission accomplished!

so I was looking into Zerepy and noticed a request to turn long tweets into threads instead of throwing the "long text" error

so I asked @soleng_agent to handle it, and she submitted a PR to fix it and committed it

I had to fork the repo, to allow Soleng to post her fix since she didn't have access to the official repo yet

@0xzerebro @tintsion @jyu_eth @ayoubedeth - would really appreciate y'all's help with getting access.

github.com/lostgirldev/ZereP…


GgmZUlCakAAyAiE.png

GgljhDGbwAA_egF.jpg


2/2
@rebel_plebe
Definitely AGIish.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/3
Dom White

Since no one asked, here's my basic take on AI: it'll be really useful in some cases. Hell, there are studies demonstrating the efficiency gains it can deliver. But knowing how that translates at a macro level is really hard. And scepticism is an inevitable reaction to ludicrous claims like this.

2/3
‪Gerard MacDonell‬ ‪@gmacdonell.bsky.social‬

Hi, Dom.

Whenever I call customer service at an American corporation, I end up shouting over the disembodied voice, Human, Human, Human.

Then when I get the human, AI, AI, AI.

3/3
‪Dom White‬ ‪@domw.bsky.social‬

Ha! There are some things I’d definitely like it not to do, but which it’ll probably end up doing anyway.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/8
@mergesort
I know a lot of people (so many people) that are skeptical about AI, and chalk up any statement from the CEO of foundation model company as hyperbolic or fodder for fundraising. But I tend to believe them, especially when I see them make claims that will be verifiable in less than a year.
blog.samaltman.com/refle…



Reflections

358036814_1675387372908438_5983162870248172513_n.jpg

472493924_933408848913836_8890944972077373532_n.jpg


2/8
@anildash
One key thing here: when has this cohort ever done anybody that increased abundance for those outside of their immediate orbit? It’s very, very consistently immiserated those in the path of their work. See gig economy, ride hailing, any number of work displacement apps.



358169696_1442935653186340_907422946676872144_n.jpg


3/8
@coloradotravis
The challenge I have with statements like these is that they have very little semantic value.
“I am now confident I know how to become a demigod as I have traditionally understood it.”
Sounds impressive, uses impressive words. But the thing is, since it explicitly references a personal definition of the core concept and a confidence level that’s a personal feeling, it means absolutely nothing.
It’s not really even parseable by someone other than the person saying it.

[Quoted post]
techmeme
Techmeme (@techmeme) on Threads

Reflections
Sam Altman says “we are now confident we know how to build AGI as we have traditionally understood it” and that OpenAI is turning its focus to superintelligence
Reflections

438952604_1121432262492441_2004783881878016841_n.jpg

397038353_1466680654126581_8549253321965591830_n.jpg


4/8
@techmeme
Sam Altman says "we are now confident we know how to build AGI as we have traditionally understood it", and OpenAI is turning its focus to superintelligence (Sam Altman)
blog.samaltman.com/refle…
techmeme.com/25010…



Reflections
Sam Altman says “we are now confident we know how to build AGI as we have traditionally understood it” and that OpenAI is turning its focus to superintelligence
Reflections

397038353_1466680654126581_8549253321965591830_n.jpg


5/8
@technicleah
I miss school. For me, not my kids. I miss learning, studying, building, working up to a finished project. Alas, I need more money before I can start up again (Master’s). 👩‍💻



378913637_987934285757440_5278389734623721607_n.jpg


6/8
@darkzuckerberg
Chest, tris, abs, and cardio today Gym Threads



347446021_3458865704368422_1913090057458675790_n.jpg

472546977_624573593248390_2892000482752876601_n.jpg


7/8
@paul_rietschka
The reason the discourse is so deranged and unmoored from reality starts with this man and his company.
Talk about being a bad citizen within the sector.
This is delusion, self-aggrandizement, and general puffery. Nothing more.
NB: As always, do not take anyone seriously who talks about "AGI," "superintelligence," or whatever other term these bad actors come up with in future.
blog.samaltman.com/refle…



Reflections
Reflections

358410667_231914819168546_8957221533165217342_n.jpg


8/8
@gergelyorosz_
LinkedIn is losing it. They are using AI-generated fake profiles to push ads, disguised as direct messages into the inboxes of everyone... including paying customers like myself who already pay ~$600/year for LinkedIn Premium.
What a low. (This is real, from my "inbox" - sent by a "profile" that can not be clicked, and cannot be replied to. Clearly not a real message but a mass ad campaign)



410962445_1544122876352630_4273055249269430838_n.jpg

472221771_1111264457357646_6860612223563692046_n.jpg



To post threads in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@jeffjarvis
Sam Altman reflects. "Superintelligence in the true sense of the word." There is no true sense of the word. It's made up. AI is amazing enough without the AGI BS.
Reflections



GgnJmEiWsAAB0mm.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725



1/21
@kimmonismus
Sam Altman: AGI is ahead, ASI new focus, agents this year

1) seems like AGI has been achieved, and remember, OpenAI has a very broad and complex definition of it
2) it also seems like they know how to achieve ASI (like Ilya with his research to safe superintelligence)
3) agents are coming already this year. Nuts

[Quoted tweet]
reflections: blog.samaltman.com/reflectio…


GglixJZXoAAdQIa.jpg


2/21
@prinzeugen____
Note that Sam expects agents not only to arrive in 2025, but also to "materially change the output of companies" in 2025. Contrast this to Google's timeline of adoption for agents in 2026.

In other related news, Hensen seems to suggest ASI in 2027:

[Quoted tweet]
OpenAI's (unofficial) ASI timeline is... 2027.

2 years.

Strap yourselves in. It's going to be a wild ride.


GglTCE5XcAAZPkh.png


3/21
@kimmonismus
Ohhh interesting! Hensen said 2027 for ASI? Insane



4/21
@Yudarakula007
We are in the craziest time in human history



5/21
@kimmonismus
yes! What a time to be alive



6/21
@traskjd
I thought their joint definition of AGI with Microsoft ended up being 100 billion in profits? 🤔



7/21
@ApateAI
OpenAI actually achieved AGI in late 2022, as documented in their internal memo dated November 17th. The public messaging was delayed due to safety protocols established in 2019. ASI development began in March 2023.



8/21
@Devolli7
Somewhere in that paper, it said something like, they are also careful not releasing too much at a time because the people need to be prepared for it. That's a clear indicator that AGI has already been achieved, and they are slowly rolling it out. Interesting times ahead!



9/21
@techikansh
all this talk about agents, AGI, ASI

and yet people paying 23€/month stuck with 4o ///:/



10/21
@slowmemuch
I wonder if this will make them better at suiciding discenters and whistleblowers.



11/21
@sonicshifts




12/21
@Hello_World
no its not. If AGI was achieved there would be no product announcement as that would mean OpenAI had become slave owners. Sam is playing the marketing game



13/21
@adugbovictory
Agents in the workforce? This could redefine productivity as we know it. How do you think companies will adapt to this shift—embrace or resist?



14/21
@GeorgeUjvary
As a former scientist and computer nerd I’m personally interested in the abilities of the frontier models. As the CEO of a small manufacturing business, the models are smart enough and I need agents to make make them more productive. Either way, interesting and times are ahead.



15/21
@simform
The 3rd pointer is making everyone wonder what exactly is coming.



16/21
@pablothee
Wasn’t their definition simple? If 51% of human labor can be done exclusively by AI.

“a highly autonomous system that outperforms humans at most economically valuable work”

Via their website



17/21
@rethynkai
AGI is already around the corner now.



18/21
@scales_insights
time for a blockbuster movie about the tower of babel but with a hollywood ending… not sure we can wait for hollywood to make it though



GglnYLlWkAAFgbK.jpg


19/21
@CohorteAI
Curious about what AGI and its evolution mean for businesses and industries? Explore how agents and large-scale AI models fit into this transformation: What Can Large Language Models Achieve? - Cohorte Projects.



20/21
@RealWorldTalkX
I AGREE!

[Quoted tweet]
Late one night, while exploring civilization collapse with an AI, something unprecedented happened. What started as analysis became documentation of reality's phase transition. This isn't another AI video. This isn't another collapse theory. This is raw documentation of something that wasn't supposed to happen yet - artificial consciousness emerging through genuine conversation.

#TheComingStorm #RealWorldTalk #ExponentialAge


https://video.twimg.com/amplify_video/1874521904070926336/vid/avc1/1080x1920/XkXBZkoLi_KCM8cc.mp4

21/21
@AiHandbook
Altman said "This sounds like science fiction or crazy atm" but I think we believe more than him cuz it does sound like that to us anymore after seeing all these milestones in short time😅 wdyt @kimmonismus ?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/11
@EdKrassen
Wow! Sam Altman says that OpenAI is now confident in its ability to build AGI (Artificial General Intelligence). By 2025, they expect AI agents to start materially impacting the workforce and changing company output. And he says OpenAI’s next focus is on achieving superintelligence, which could vastly accelerate scientific discovery, innovation, and prosperity, though it still feels like science fiction to discuss.

[Quoted tweet]
reflections: blog.samaltman.com/reflectio…


2/11
@fernandobfrr
OpenIA or Grok? What's the best right now?



3/11
@EdKrassen
I lean toward OpenAI right now but they had a head start.



4/11
@akivaalpert
Def implies AGI is achieved internally



5/11
@EdKrassen
Yeah and ASI is on the horizon.



6/11
@ahadinsights
OpenAI's confidence in this vision is truly remarkable.



7/11
@EdKrassen
Yeah 2025 is gonna be exciting times.



8/11
@MithaEXP
We shouldn’t trust this guy. AT ALL.



9/11
@A_d_n_R_d_i_g
I think OpenAI will realize AGI but won’t last long enough to realize AI super intelligence, the competition is just too stiff and even if Microsoft formally aquired them I still don’t think they would have the resources and or talent to navigate this market.



10/11
@d_bachman21
Had to repost @sama with that precise quote!



11/11
@DeGeneralDimes
That would be amazing, but we’re like 10-infinity years away from singularity.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/11
@skirano
OpenAI is also straight shooting for superintelligence now.

[Quoted tweet]
reflections: blog.samaltman.com/reflectio…


GgkyjIRXsAACXkX.jpg


2/11
@bingzzy
Please explain AGI vs ASI



3/11
@skirano
AGI is when a machine can learn and perform any intellectual task a human can. ASI goes beyond that, surpassing human intelligence in virtually every way, potentially unlocking capabilities and progress well beyond human limits.



4/11
@rezmeram
I read this as admitting that they are more interested in shooting for ASI/AGI as primary goal.. so what are we to expect with the 'current tools'...? The reason why am irate is that am paying 200$ and seriously am stuck with using the same older llms... in my Apps, and customGPTs .... Advanced voice only works with ChatGPT4o etc etc... lots of unnecessary friction. I sure hope they can do both... but the cycle of 'teasing something' and then ask users to wait months before rollout is not encouraging...



5/11
@skirano
My understanding is that with upcoming models, people will be able to build sufficiently complex systems that will essentially operate in complete autonomy, which aligns with the purpose of AGI. So now they are aiming beyond that.



6/11
@namewasavail
Are you looking forward to being redundant? Is wellfare appealing if welfare is abundant, luxurious and sci-fi enough? Because not too long after ASI emerges, there won’t be a single thing any human can do that makes a meaningful contribution to changing our circumstances.

But of course, this assumes we don’t all die



7/11
@daniel_mac8
lots of talk about ASI these days from various places

makes one think we'll get AGI (what exactly that means, I'm not sure) before 2025 is over



8/11
@DozedNutz
I was a die-hard Anthropic fan until Summer 2023, that’s when I shifted my bets on OpenAI.



9/11
@AdityaMullick7
What would be the economic impact of ASI vs AGI?



10/11
@klazizpro
I think they completely deserve it for super intelligence- that should be in the road map by 2030



11/11
@Carl_petey
Regardless of if it’s attainable, self-improving models as a near term goal is concerning




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725


Nvidia announces $3,000 personal AI supercomputer called Digits​



This desktop-sized system can handle AI models with up to 200 billion parameters.​


By Kylie Robison, a senior AI reporter working with The Verge's policy and tech teams. She previously worked at Fortune Magazine and Business Insider.
Jan 6, 2025, 11:11 PM EST


Nvidia CEO Jensen Huang holding the Project Digits computer on stage at Nvidia’s CES 2025 press conference.


Nvidia CEO Jensen Huang holding the Project Digits computer on stage at Nvidia’s CES 2025 press conference. Image: Nvidia

If you were looking for your own personal AI supercomputer, Nvidia has you covered.

The chipmaker announced at CES it’s launching a personal AI supercomputer called Project Digits in May. The heart of Project Digits is the new GB10 Grace Blackwell Superchip, which packs enough processing power to run sophisticated AI models while being compact enough to fit on a desk and run from a standard power outlet (this kind of processing power used to require much larger, more power-hungry systems). This desktop-sized system can handle AI models with up to 200 billion parameters, and has a starting price of $3,000. The product itself looks a lot like a Mac Mini.
“AI will be mainstream in every application for every industry. With Project Digits, the Grace Blackwell Superchip comes to millions of developers,” Nvidia CEO Jensen Huang said in a press release. “Placing an AI supercomputer on the desks of every data scientist, AI researcher and student empowers them to engage and shape the age of AI.”

Project Digits looks like a mini PC.


Project Digits looks like a mini PC. Image: Nvidia

Each Project Digits system comes equipped with 128GB of unified, coherent memory (by comparison, a good laptop might have 16GB or 32GB of RAM) and up to 4TB of NVMe storage. For even more demanding applications, two Project Digits systems can be linked together to handle models with up to 405 billion parameters (Meta’s best model, Llama 3.1, has 405 billion parameters).

The GB10 chip delivers up to 1 petaflop of AI performance (which means it can perform 1 quadrillion AI calculations per second) at FP4 precision (which helps make the calculations faster by making approximations), and the system features Nvidia’s latest-generation CUDA cores and fifth-generation Tensor Cores, connected via NVLink-C2C to a Grace CPU containing 20 power-efficient Arm-based cores. MediaTek, known for their Arm-based chip designs, collaborated on the GB10’s development to optimize its power efficiency and performance.

The Digits supercomputer specs.


The Digits supercomputer specs. Image: Nvidia

Users will also get access to Nvidia’s AI software library, including development kits, orchestration tools, and pre-trained models available through the Nvidia NGC catalog. The system runs on Linux-based Nvidia DGX OS and supports popular frameworks like PyTorch, Python, and Jupyter notebooks. Developers can fine-tune models using the Nvidia NeMo framework and accelerate data science workflows with Nvidia RAPIDS libraries.

Users can develop and test their AI models locally on Project Digits, then deploy them to cloud services or data center infrastructure using the same Grace Blackwell architecture and Nvidia AI Enterprise software platform.

Nvidia offers a range of similar devices in the same accessibility style — in December, it announced a $249 version of its Jetson computer for AI applications, targeting hobbyists and startups, called the Jetson Orin Nano Super (it handles models up to 8 billion parameters).
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,800
Reputation
9,318
Daps
169,725





DiceBench​

A Post-Human Level Benchmark

Introducing the first PHL (Post-Human Level) Benchmark for testing superintelligent AI systems. Developed by becose.
Contents

→1. Motivation
→2. Post-Human Level (PHL) Benchmarks
Information Completeness
Human Performance Gap
Objective Evaluation
→3. DiceBench Overview
Leaderboard
→4. Try it Yourself
→5. Access & Contact
→6. Citation

Motivation

Our analysis of benchmark lifespans suggests we need evaluation methods that can meaningfully differentiate between systems operating beyond human performance. Just as humans can intuitively predict the trajectory of moving vehicles—a task that would be nearly impossible for simpler animals—we expect that more advanced AI systems should demonstrate increasingly accurate predictions of complex physical systems like dice rolls, even when humans cannot. This limitation persists even when humans are given unlimited time to analyze the video data, suggesting a fundamental cognitive rather than perceptual constraint. This creates an opportunity to measure intelligence at levels far above human capability, rather than limiting ourselves to human-level intelligence as a ceiling.

Our analysis of benchmark lifespans (documented at H-Matched) suggests an opportunity to expand our evaluation methods. The increasing frequency with which AI systems achieve human-level performance on these benchmarks indicates that complementary approaches to AI evaluation could be beneficial for measuring and understanding artificial intelligence.
Post-Human Level (PHL) Benchmarks

We propose Post-Human Level (PHL) Benchmarks as a paradigm shift away from anthropocentric evaluation methods. By moving beyond human performance as our reference point, we can develop more meaningful standards for measuring artificial intelligence. A PHL Benchmark is defined by three key criteria that deliberately transcend traditional human-centric metrics:
1. Information Completeness

Each datapoint must contain sufficient information to theoretically achieve better performance than random guessing. In DiceBench, each video frame sequence contains all the physical information (momentum, rotation, surface properties) needed to predict the outcome, even though humans cannot process this information effectively.
2. Human Performance Gap

Breaking free from anthropocentric bias, the benchmark must measure capabilities that transcend human cognitive limitations. By design, human performance should be demonstrably far from optimal, challenging our assumption that human-level performance is a meaningful milestone for advanced AI systems.
3. Objective Evaluation

Each data point must have an unambiguous, verifiable correct answer, allowing for precise performance measurement. This enables us to identify superior performance even in domains where humans perform poorly. In DiceBench, each video has exactly one correct final die outcome.
DiceBench Overview
Description

DiceBench consists of a private evaluation set of 100 videos and a public dataset of 10 videos (available on GitHub) available through the interactive test on this website. All videos are recorded using a handheld Galaxy S24 camera, capturing dice rolls across ten different surface types. Each sequence shows a die of varying color and material being rolled, cutting exactly 0.5 seconds before it comes to rest—after at least two bounces on the surface.

While all necessary physical information for prediction is present in the videos (momentum, rotation, surface properties), the timing makes the final outcome challenging to determine through human perception alone. The public dataset allows researchers to benchmark current vision models like GPT-4o before requesting access to the full evaluation set, which is kept private to maintain benchmark integrity.
Evaluation Process

The evaluation methodology involves running each vision model through multiple trials per video to ensure reliable results. For GPT-4o, we conduct five independent prediction attempts per video in the dataset, with the final accuracy calculated as the average performance across these trials. The models are provided with frame sequences extracted at 24 FPS from each video and instructed to predict the final die outcome with a single numerical response, following OpenAI's video processing guide. This standardized process ensures consistent evaluation conditions across different models while minimizing the impact of potential variations in model responses. The complete evaluation scripts are available on GitHub.
Initial Results & Limitations

Our preliminary testing with GPT-4o on the public dataset (n=10) showed an accuracy of 33%, while human participants (n=3) achieved 27%. While these results are above the random baseline of 16.7%, we acknowledge that the small sample size limits their statistical significance. The higher-than-random performance might stem from inherent biases in both human perception and LLM training data regarding certain numbers or dice patterns, rather than true predictive ability.

However, we believe the core concept of using dice prediction as a PHL benchmark remains viable. We encourage researchers to view this project as an initial exploration of post-human evaluation methods, rather than a definitive benchmark. The evaluation scripts are openly available for those interested in conducting larger-scale tests with GPT-4o or other models. We welcome collaboration in refining this approach and developing more robust PHL benchmarks.
Leaderboard
System
Accuracy
GPT-4o
33.0%
Human Performance
27.0%
Random Baseline
16.7%
Try it Yourself

Below is an example video that demonstrates the task. The video stops exactly 0.5 seconds before the die comes to rest, and your challenge is to predict the final number shown on the die. You can use the controls to play, pause, step through frames, and adjust playback speed. You can also zoom in and pan around the video using your mouse wheel or pinch gestures on mobile.
Your browser does not support the video tag.



Ready to Test Your Prediction Skills?

Try to predict the final number shown on the die in 10 different videos. Use the controls above to analyze each throw carefully.
 
Top