The A.I. Bubble is Bursting

Hood Critic · Jul 23, 2024

The AI field is being saturated by efforts to monetize it. It's not going to "burst" but there will be a logical stagnation until the next major development.

bnew · Jul 23, 2024

☑︎#VoteDemocrat said:

@00:40 when he saids "theres a problem and large language models require data written by humans to train on", thats not true.

1/1
I've written a succinct write-up about this development and it's significance. The article is titled "AI Can Write Near Human-Level College Textbooks".

In it, I explain how we discovered this realization and discuss its potential implications.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

https://archive.ph/ZoKGd

https://archive.ph/1K2iR

https://archive.ph/xOz26

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Instruction tuning enables pretrained language models to perform new tasks from inference-time natural language descriptions. These approaches rely on vast amounts of human supervision in the form of crowdsourced datasets or user interactions. In this work, we introduce Unnatural Instructions: a...

arxiv.org

Micky Mikey · Jul 23, 2024

The current generation of models are stagnating. Once ChatGPT5 or 6 (with reasoning capabilities) comes out the hype will start all over again.

Micky Mikey · Jul 23, 2024

Also I personally think a lot of these A.I. companies are holding out until after the election to release the next big thing.

bnew · Jul 23, 2024

Micky Mikey said:
The current generation of models are stagnating. Once ChatGPT5 or 6 (with reasoning capabilities) comes out the hype will start all over again.

Claude sonnet 3.5 was recently released and has surpassed chatgpt in some cases.

Llama 3.1 405B just got released today!

1/1
We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.

We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
LLaMa 3.1 benchmarks side by side.

This is truly a SOTA model.

Beats GPT4 almost on every single benchmark.

Continuously trained with a 128K context length.

Pre-trained on 15.6T tokens (405B).

The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.

Most SFT examples were using synthetic data.

Trained on 16K H100 GPUs.

License allows output of the model to train other models

It seems like a future version integrated image, video, and speech capabilities using a compositional approach (not released yet).

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
With the introduction of Llama 3.1 405B, we now have an open-source model that beats the best closed-source one available today on selected benchmarks.

What a time.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
This might be the biggest moment for Open-Source AI.

Meta just released Llama 3.1 and a 405 billion parameter model, the most sophisticated open model ever released.

It already outperforms GPT-4o on several benchmarks.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Compared leaked Llama 3.1 benchmarks with other leading models, very excited for the release!

We can tier out models by price / 1M output tokens.

O($0.10): 4o-mini and <10B param models. I think 4o-mini will still be best but a strong local 8B will unlock lots of applications.
O($1): 70B class models, Haiku, 3.5-turbo. Distilled 70B looks like a category winner here! This is a nice price-point for lots of AI apps.
O($10): 405B, 4o, 3.5 Sonnet. Have to see how the post-training and harder benches go. Guess 3.5 sonnet is still the best, but 405B might give 4o real competition. This is just vibes, I like Sonnet's RLHF and hate the GPT RLHF.

Other takeaways:
- most benchmarks are saturated and probably not informative. OpenAI only reports harder benchmarks now, other developers should too (eg MATH > GSM8K)
- 405B doesn't look much better than distilled 70B, but harder benches and vibe tests will be better measurements than these tests
- 8B/70B distilled models are substantially better than when trained from scratch. I've wondered if for a given compute budget, it is better to overtrain a X param model or to train a X' (where X' >> X) and distill to X, maybe we will find out
- a lot of people thought that the original 8B saturated the params after 15T tokens. this is good evidence that it did not. softmax with high token count may have been why it did not quantize well. curious if the Llama 4 series will train in FP8 or BF16 -- logistically, serving 400B on 1x8H100 node seems much easier than 2x8H100 and it's much simpler to do this if the model was pretrained quantization-aware
- Gemma models do surprisingly well on MMLU relative to their other scores. most of the innovation in Gemma was supposed to be post-training, so curious if the 27B will hold up vs new 8B-Instruct
- Mistral Nemo 12B and 3.1 8B look somewhat similar, but I'd guess most developers will stick to Llama's better tooling and smaller param count. tough timing
- I am fairly sure that 3.1 was not trained early fusion, and somebody's going to throw up a Llava finetune in 2-3 days.
- personal guess (using other info) is that 405B-Instruct will fall short of Sonnet / 4o. but man, what a good model to have open source, and the gap is closing
- llama3.1405 looks like a Pi joke

all models are base except 4o-class, took the best available score from different repos and averaged MMLU for Llama. all benchmarks are wrong but hopefully useful for an overall picture.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 23, 2024

@51:09 he saids the market will realize LLM's are not the future but they're already realizing that in some respects, multi-modal models are emerging soon to be replaced by world models etc. large language models largely referred to text generating models. now models are doing, speech-to-text, text-to-speech, speech-o-speech , text-to-image, text-to-video etc.

it's crazy though to say generative large language models have no use case since the biggest hindrance to replacing to replacing workers now isn't even accuracy but processing power/ inferencing time. tool use or function calling hasn't plateaued.

Meta investments in AI basically gave it a bugger future, crazy how they think it'll go out of business first. with more powerful and larger context models, meta will be able to deliver more products to their users much faster than ever before. they mostly referenced products by google, meta, apple, microsoft and chatgpt. there are at least a dozen companies working on a SORA text to video competitor.

Micky Mikey · Jul 23, 2024

bnew said:
Claude sonnet 3.5 was recently released and has surpassed chatgpt in some cases.

Llama 3.1 405B just got released today!

1/1
We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.

We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
LLaMa 3.1 benchmarks side by side.

This is truly a SOTA model.

Beats GPT4 almost on every single benchmark.

Continuously trained with a 128K context length.

Pre-trained on 15.6T tokens (405B).

The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.

Most SFT examples were using synthetic data.

Trained on 16K H100 GPUs.

License allows output of the model to train other models

It seems like a future version integrated image, video, and speech capabilities using a compositional approach (not released yet).

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
With the introduction of Llama 3.1 405B, we now have an open-source model that beats the best closed-source one available today on selected benchmarks.

What a time.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
This might be the biggest moment for Open-Source AI.

Meta just released Llama 3.1 and a 405 billion parameter model, the most sophisticated open model ever released.

It already outperforms GPT-4o on several benchmarks.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Compared leaked Llama 3.1 benchmarks with other leading models, very excited for the release!

We can tier out models by price / 1M output tokens.

O($0.10): 4o-mini and <10B param models. I think 4o-mini will still be best but a strong local 8B will unlock lots of applications.
O($1): 70B class models, Haiku, 3.5-turbo. Distilled 70B looks like a category winner here! This is a nice price-point for lots of AI apps.
O($10): 405B, 4o, 3.5 Sonnet. Have to see how the post-training and harder benches go. Guess 3.5 sonnet is still the best, but 405B might give 4o real competition. This is just vibes, I like Sonnet's RLHF and hate the GPT RLHF.

Other takeaways:
- most benchmarks are saturated and probably not informative. OpenAI only reports harder benchmarks now, other developers should too (eg MATH > GSM8K)
- 405B doesn't look much better than distilled 70B, but harder benches and vibe tests will be better measurements than these tests
- 8B/70B distilled models are substantially better than when trained from scratch. I've wondered if for a given compute budget, it is better to overtrain a X param model or to train a X' (where X' >> X) and distill to X, maybe we will find out
- a lot of people thought that the original 8B saturated the params after 15T tokens. this is good evidence that it did not. softmax with high token count may have been why it did not quantize well. curious if the Llama 4 series will train in FP8 or BF16 -- logistically, serving 400B on 1x8H100 node seems much easier than 2x8H100 and it's much simpler to do this if the model was pretrained quantization-aware
- Gemma models do surprisingly well on MMLU relative to their other scores. most of the innovation in Gemma was supposed to be post-training, so curious if the 27B will hold up vs new 8B-Instruct
- Mistral Nemo 12B and 3.1 8B look somewhat similar, but I'd guess most developers will stick to Llama's better tooling and smaller param count. tough timing
- I am fairly sure that 3.1 was not trained early fusion, and somebody's going to throw up a Llava finetune in 2-3 days.
- personal guess (using other info) is that 405B-Instruct will fall short of Sonnet / 4o. but man, what a good model to have open source, and the gap is closing
- llama3.1405 looks like a Pi joke

all models are base except 4o-class, took the best available score from different repos and averaged MMLU for Llama. all benchmarks are wrong but hopefully useful for an overall picture.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

I mostly use Claude Sonnet 3.5 currently and its impressive. And its good to see an open source model finally on par with ChatGPT 4. But neither of these models are the gigantic leap forward needed to replace human labor or for widespread adoption by companies.

☑︎#VoteDemocrat · Jul 26, 2024

Wall Street is starting to ask CEOs when their AI spending binge will actually make money

Sundar Pichai got peppered on AI ROI during Alphabet’s earnings call. Similar questions will probably come up for other tech giants....

sherwood.news

bnew · Jul 26, 2024

☑︎#VoteDemocrat said:
Wall Street is starting to ask CEOs when their AI spending binge will actually make money

Sundar Pichai got peppered on AI ROI during Alphabet’s earnings call. Similar questions will probably come up for other tech giants....

sherwood.news

Ross Sandler, an analyst at Barclays, said on Alphabet's earnings call that it looks like AI may be going from an “underbuilt situation” last year to “potentially being overbuilt next year” if the rate of investment in AI keeps up.

“How are we thinking about the return on invested capital with this AI capex cycle?” he asked.

Pichai responded by saying the risk of missing out on the benefits of investing in AI outweigh the risk that they may be investing too much.

he's right, that's the only thing that matters at this point. :manny:

The A.I. Bubble is Bursting

More options

☑︎#VoteDemocrat

The Original

☑︎#VoteDemocrat

The Original

Hood Critic

The Power Circle

bnew

Veteran

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Micky Mikey

Veteran

Micky Mikey

Veteran

bnew

Veteran

bnew

Veteran

Micky Mikey

Veteran

☑︎#VoteDemocrat

The Original

Wall Street is starting to ask CEOs when their AI spending binge will actually make money

bnew

Veteran

Wall Street is starting to ask CEOs when their AI spending binge will actually make money

Similar threads