The A.I. Bubble is Bursting

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,004
Reputation
7,865
Daps
147,240



@00:40 when he saids "theres a problem and large language models require data written by humans to train on", thats not true.



1/1
I've written a succinct write-up about this development and it's significance. The article is titled "AI Can Write Near Human-Level College Textbooks".

In it, I explain how we discovered this realization and discuss its potential implications.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
F7NlbA1XsAAt8LF.jpg

F7Nl-ebXsAA_im7.jpg








3J4y75x.png


4Oop0O2.png


FEsYtCC.png
 

Micky Mikey

Veteran
Supporter
Joined
Sep 27, 2013
Messages
15,194
Reputation
2,681
Daps
83,437
The current generation of models are stagnating. Once ChatGPT5 or 6 (with reasoning capabilities) comes out the hype will start all over again.
 

Micky Mikey

Veteran
Supporter
Joined
Sep 27, 2013
Messages
15,194
Reputation
2,681
Daps
83,437
Also I personally think a lot of these A.I. companies are holding out until after the election to release the next big thing.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,004
Reputation
7,865
Daps
147,240
The current generation of models are stagnating. Once ChatGPT5 or 6 (with reasoning capabilities) comes out the hype will start all over again.

Claude sonnet 3.5 was recently released and has surpassed chatgpt in some cases.


Llama 3.1 405B just got released today!



1/1
We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.

We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
LLaMa 3.1 benchmarks side by side.

This is truly a SOTA model.

Beats GPT4 almost on every single benchmark.

Continuously trained with a 128K context length.

Pre-trained on 15.6T tokens (405B).

The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.

Most SFT examples were using synthetic data.

Trained on 16K H100 GPUs.

License allows output of the model to train other models

It seems like a future version integrated image, video, and speech capabilities using a compositional approach (not released yet).


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTLrzZUXgAApuYv.jpg

GTLshgpW8AA67_h.jpg


1/1
With the introduction of Llama 3.1 405B, we now have an open-source model that beats the best closed-source one available today on selected benchmarks.

What a time.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTLsgeRWEAAITL5.jpg



1/1
This might be the biggest moment for Open-Source AI.

Meta just released Llama 3.1 and a 405 billion parameter model, the most sophisticated open model ever released.

It already outperforms GPT-4o on several benchmarks.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
Compared leaked Llama 3.1 benchmarks with other leading models, very excited for the release!

We can tier out models by price / 1M output tokens.

O($0.10): 4o-mini and <10B param models. I think 4o-mini will still be best but a strong local 8B will unlock lots of applications.
O($1): 70B class models, Haiku, 3.5-turbo. Distilled 70B looks like a category winner here! This is a nice price-point for lots of AI apps.
O($10): 405B, 4o, 3.5 Sonnet. Have to see how the post-training and harder benches go. Guess 3.5 sonnet is still the best, but 405B might give 4o real competition. This is just vibes, I like Sonnet's RLHF and hate the GPT RLHF.

Other takeaways:
- most benchmarks are saturated and probably not informative. OpenAI only reports harder benchmarks now, other developers should too (eg MATH > GSM8K)
- 405B doesn't look much better than distilled 70B, but harder benches and vibe tests will be better measurements than these tests
- 8B/70B distilled models are substantially better than when trained from scratch. I've wondered if for a given compute budget, it is better to overtrain a X param model or to train a X' (where X' >> X) and distill to X, maybe we will find out
- a lot of people thought that the original 8B saturated the params after 15T tokens. this is good evidence that it did not. softmax with high token count may have been why it did not quantize well. curious if the Llama 4 series will train in FP8 or BF16 -- logistically, serving 400B on 1x8H100 node seems much easier than 2x8H100 and it's much simpler to do this if the model was pretrained quantization-aware
- Gemma models do surprisingly well on MMLU relative to their other scores. most of the innovation in Gemma was supposed to be post-training, so curious if the 27B will hold up vs new 8B-Instruct
- Mistral Nemo 12B and 3.1 8B look somewhat similar, but I'd guess most developers will stick to Llama's better tooling and smaller param count. tough timing
- I am fairly sure that 3.1 was not trained early fusion, and somebody's going to throw up a Llava finetune in 2-3 days.
- personal guess (using other info) is that 405B-Instruct will fall short of Sonnet / 4o. but man, what a good model to have open source, and the gap is closing
- llama3.1405 looks like a Pi joke

all models are base except 4o-class, took the best available score from different repos and averaged MMLU for Llama. all benchmarks are wrong but hopefully useful for an overall picture.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTH10NAWQAAgyXi.jpg






 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,004
Reputation
7,865
Daps
147,240
@51:09 he saids the market will realize LLM's are not the future but they're already realizing that in some respects, multi-modal models are emerging soon to be replaced by world models etc. large language models largely referred to text generating models. now models are doing, speech-to-text, text-to-speech, speech-o-speech , text-to-image, text-to-video etc.

it's crazy though to say generative large language models have no use case since the biggest hindrance to replacing to replacing workers now isn't even accuracy but processing power/ inferencing time. tool use or function calling hasn't plateaued.

Meta investments in AI basically gave it a bugger future, crazy how they think it'll go out of business first. with more powerful and larger context models, meta will be able to deliver more products to their users much faster than ever before. they mostly referenced products by google, meta, apple, microsoft and chatgpt. there are at least a dozen companies working on a SORA text to video competitor.
 
Last edited:

Micky Mikey

Veteran
Supporter
Joined
Sep 27, 2013
Messages
15,194
Reputation
2,681
Daps
83,437
Claude sonnet 3.5 was recently released and has surpassed chatgpt in some cases.


Llama 3.1 405B just got released today!



1/1
We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.

We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
LLaMa 3.1 benchmarks side by side.

This is truly a SOTA model.

Beats GPT4 almost on every single benchmark.

Continuously trained with a 128K context length.

Pre-trained on 15.6T tokens (405B).

The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.

Most SFT examples were using synthetic data.

Trained on 16K H100 GPUs.

License allows output of the model to train other models

It seems like a future version integrated image, video, and speech capabilities using a compositional approach (not released yet).


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTLrzZUXgAApuYv.jpg

GTLshgpW8AA67_h.jpg


1/1
With the introduction of Llama 3.1 405B, we now have an open-source model that beats the best closed-source one available today on selected benchmarks.

What a time.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTLsgeRWEAAITL5.jpg



1/1
This might be the biggest moment for Open-Source AI.

Meta just released Llama 3.1 and a 405 billion parameter model, the most sophisticated open model ever released.

It already outperforms GPT-4o on several benchmarks.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
Compared leaked Llama 3.1 benchmarks with other leading models, very excited for the release!

We can tier out models by price / 1M output tokens.

O($0.10): 4o-mini and <10B param models. I think 4o-mini will still be best but a strong local 8B will unlock lots of applications.
O($1): 70B class models, Haiku, 3.5-turbo. Distilled 70B looks like a category winner here! This is a nice price-point for lots of AI apps.
O($10): 405B, 4o, 3.5 Sonnet. Have to see how the post-training and harder benches go. Guess 3.5 sonnet is still the best, but 405B might give 4o real competition. This is just vibes, I like Sonnet's RLHF and hate the GPT RLHF.

Other takeaways:
- most benchmarks are saturated and probably not informative. OpenAI only reports harder benchmarks now, other developers should too (eg MATH > GSM8K)
- 405B doesn't look much better than distilled 70B, but harder benches and vibe tests will be better measurements than these tests
- 8B/70B distilled models are substantially better than when trained from scratch. I've wondered if for a given compute budget, it is better to overtrain a X param model or to train a X' (where X' >> X) and distill to X, maybe we will find out
- a lot of people thought that the original 8B saturated the params after 15T tokens. this is good evidence that it did not. softmax with high token count may have been why it did not quantize well. curious if the Llama 4 series will train in FP8 or BF16 -- logistically, serving 400B on 1x8H100 node seems much easier than 2x8H100 and it's much simpler to do this if the model was pretrained quantization-aware
- Gemma models do surprisingly well on MMLU relative to their other scores. most of the innovation in Gemma was supposed to be post-training, so curious if the 27B will hold up vs new 8B-Instruct
- Mistral Nemo 12B and 3.1 8B look somewhat similar, but I'd guess most developers will stick to Llama's better tooling and smaller param count. tough timing
- I am fairly sure that 3.1 was not trained early fusion, and somebody's going to throw up a Llava finetune in 2-3 days.
- personal guess (using other info) is that 405B-Instruct will fall short of Sonnet / 4o. but man, what a good model to have open source, and the gap is closing
- llama3.1405 looks like a Pi joke

all models are base except 4o-class, took the best available score from different repos and averaged MMLU for Llama. all benchmarks are wrong but hopefully useful for an overall picture.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTH10NAWQAAgyXi.jpg







I mostly use Claude Sonnet 3.5 currently and its impressive. And its good to see an open source model finally on par with ChatGPT 4. But neither of these models are the gigantic leap forward needed to replace human labor or for widespread adoption by companies.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,004
Reputation
7,865
Daps
147,240

Ross Sandler, an analyst at Barclays, said on Alphabet's earnings call that it looks like AI may be going from an “underbuilt situation” last year to “potentially being overbuilt next year” if the rate of investment in AI keeps up.

“How are we thinking about the return on invested capital with this AI capex cycle?” he asked.

Pichai responded by saying the risk of missing out on the benefits of investing in AI outweigh the risk that they may be investing too much.


he's right, that's the only thing that matters at this point.:manny:
 
Top