The A.I Megathread (LLM , GPT , Development)

bnew · Sep 23, 2024

1/11
@ArtificialAnlys
Cerebras continues to deliver output speed improvements, breaking the 2,000 tokens/s barrier on Llama 3.1 8B and 550 tokens/s on 70B

Since launching less than a month ago, @CerebrasSystems has continued to improve output speed inference performance on their custom chips.

We are now measuring 2,005 output tokens per second on @AIatMeta's Llama 3.1 8B and 566 output tokens per second on Llama 3.1 70B.

Faster output speed supports use-cases which require low-latency interactions including consumer applications (games, chatbots, etc) and new techniques of using the models such as agents and multi-query RAG.

Link to our comparison of Llama 3.1 70B and 8B providers below

2/11
@ArtificialAnlys
Analysis of Llama 3.1 70B providers:
https://artificialanalysis.ai/models/llama-3-1-instruct-70b/providers

Analysis of Llama 3.1 8B providers:
https://artificialanalysis.ai/models/llama-3-1-instruct-8b/providers

https://artificialanalysis.ai/models/llama-3-1-instruct-70b/providers

3/11
@linqtoinc
Incredible work @CerebrasSystems team!

4/11
@JonathanRoseD
Llama 405B when? I assume not soon because of the hardware—a chip that could handle that would be an absolute BEAST

5/11
@alby13
Why don't they talk about running Llama 3.1 405B?

6/11
@JERBAGSCRYPTO
When IPO

7/11
@itsTimDent
We need qwen or qwen coder similar output speeds.

8/11
@kalyan5v
Cerebras systems need lot of PR if it’s taking on /search?q=#NVDA

9/11
@pa_pfeiffer
How do you validate, that the model behind is actually Llama3.1 8B with full precision (bfloat16) and not something quantized, pruned or destilled?

10/11
@tristanbob
Amazing, congrats @CerebrasSystems!

11/11
@StartupHubAI

[Quoted tweet]
Top AI investor and researcher and market analyst @nathanbenaich updated @stateofaireport with #AI chip usage

@CerebrasSystems doubled YoY.

@graphcoreai
@SambaNovaAI
@GroqInc
@HabanaLabs

it’s how many AI startup chips are cited in AI research papers, a clever proxy

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 23, 2024

1/17
@ArtificialAnlys
There is a new leader in open weights intelligence! Qwen2.5 72B tops our independent evals amongst open weights models, including compared to the much larger Llama 3.1 405B

Qwen 2.5 72B released yesterday by @Alibaba_Qwen has topped our Artificial Analysis Quality Index of evaluations. While MMLU is 1%ppt below Llama 3.1 405B in MMLU, it has strengths in Coding and Math where it challenges OpenAI's GPT-4o.

Further, given the model is much smaller than Llama 3.1 405B it should also run faster on the same hardware. It is a dense model and supports a 128k context window, the same as the Llama 3.1 series, and 8k output tokens, double the Llama 3.1 series' 4k.

@hyperbolic_labs and @DeepInfra have been quick to launch the model and are both offering the model at $0.4/M input & output tokens. This is ~10X cheaper than GPT-4o's price and the median price of Llama 3.1 405B across providers.

See below for links to our analysis

2/17
@ArtificialAnlys
Qwen2.5 comparison to other models:
Comparison of AI Models across Quality, Performance, Price | Artificial Analysis

Comparison of providers hosting Qwen2.5:
Qwen2.5 72B: API Provider Performance Benchmarking & Price Analysis | Artificial Analysis

Qwen2.5 72B: API Provider Performance Benchmarking & Price Analysis | Artificial Analysis

3/17
@zjasper666
Glad to provide open access to Qwen2.5 72B to conduct the evaluation

4/17
@ArtificialAnlys
Thanks @hyperbolic_labs for the support!

5/17
@elqniemi
This is amazing as it goes to show the boom isn't just going to be US dominance

6/17
@ChaoticDeeds
@sankalpsthakur the third stat here (math) is not relevant for annotation generation. IMO middle (knowledge) stat is relevant. Huge difference between 4o mini and 4o.. but let's still try 4o mini just because its cheap.
Any other model we should try as a plugin that will be cheaper without compromising humour quality?

7/17
@Yuchenj_UW
great work @Alibaba_Qwen team!

8/17
@AntDX316
Llama3.2, 3.5, or 4 will be released soon?

9/17
@LemonSturgis
@poe_platform, can you add this model?

10/17
@nikshepsvn
@togethercompute @FireworksAI_HQ can we get this hosted?

11/17
@roramora0
Prediction: the top 5 models at the end of 2025 will have 2-3 chinese models

12/17
@DavidFSWD
is there a fast Guided LLM JSON solution?

lm-format-enforcer + vllm = slow, 10 seconds
outlines + vllm = slow also (locks my 3090 gpu?)
tabbyapi = broken? no docs, errors
python-llama-cpp: works, but not fast
.. haven't tried NuExtract-tiny...

Any fast guided LLM solutions?

13/17
@banishandreturn
Big. Now just imagine if they released a 405B

14/17
@TuanPham672604
Did it just mogged 405B ?

15/17
@gimwonjung16
Omg

16/17
@get_palet
It would be cool to see benchmark scores relative to energy use (watts) or FLOPs consumed. Especially now that we can scale test-time compute to yield better performance from smaller models i.e. benchmark models based on performance per watt or per FLOP.

[Quoted tweet]
Everyone's focused on MMLU score (y) by release date (x) but overlooking model size. Like how ARM prioritizes performance per watt over Moore’s Law in chip design. It should be benchmark units (ELO, MMLU, GSM8K, etc.) per inference FLOP in the y-axis with release date in the x.

17/17
@SimplyObjective
Does anyone actually use these models, or do they just run gamed tests? Qwen2.5 72b isn't good. It hallucinates like crazy across most popular domains of knowledge (e.g. movies, music, games, sports...), and it makes stupid mistakes in math, logic, etc. They gamed the tests.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 23, 2024

1/11
@Alibaba_Qwen
Welcome to the party of Qwen2.5 foundation models! This time, we have the biggest release ever in the history of Qwen. In brief, we have:

Blog: Qwen2.5: A Party of Foundation Models!
Blog (LLM): Qwen2.5-LLM: Extending the boundary of LLMs
Blog (Coder): Qwen2.5-Coder: Code More, Learn More!
Blog (Math): Qwen2.5-Math: The world's leading open-sourced mathematical LLMs
HF Collection: Qwen2.5 - a Qwen Collection
ModelScope: ModelScope 魔搭社区
HF Demo: Qwen2.5 - a Hugging Face Space by Qwen

* Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
* Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
* Qwen2.5-Math: 1.5B, 7B, and 72B.

All our open-source models, except for the 3B and 72B variants, are licensed under Apache 2.0. You can find the license files in the respective Hugging Face repositories. Furthermore, we have also open-sourced the **Qwen2-VL-72B**, which features performance enhancements compared to last month's release.

As usual, we not only opensource the bf16 checkpoints but we also provide quantized model checkpoints, e.g, GPTQ, AWQ, and GGUF, and thus this time we have a total of over 100 model variants!

Notably, our flagship opensource LLM, Qwen2.5-72B-Instruct, achieves competitive performance against the proprietary models and outcompetes most opensource models in a number of benchmark evaluations!

We heard your voice about your need of the welcomed 14B and 32B models and so we bring them to you. These two models even demonstrate competitive or superior performance against the predecessor Qwen2-72B-Instruct!

SLM we care as well! The compact 3B model has grasped a wide range of knowledge and now is able to achive 68 on MMLU, beating Qwen1.5-14B!

Besides the general language models, we still focus on upgrading our expert models. Still remmeber CodeQwen1.5 and wait for CodeQwen2? This time we have new models called Qwen2.5-Coder with two variants of 1.5B and 7B parameters. Both demonstrate very competitive performance against much larger code LLMs or general LLMs!

Last month we released our first math model Qwen2-Math, and this time we have built Qwen2.5-Math on the base language models of Qwen2.5 and continued our research in reasoning, including CoT, and Tool Integrated Reasoning. What's more, this model now supports both English and Chinese! Qwen2.5-Math is way much better than Qwen2-Math and it might be your best choice of math LLM!

Lastly, if you are satisfied with our Qwen2-VL-72B but find it hard to use, now you got no worries! It is OPENSOURCED!

Prepare to start a journey of innovation with our lineup of models! We hope you enjoy them!

2/11
@Alibaba_Qwen
Qwen2.5-72B-Instruct against the opensource models!

3/11
@Alibaba_Qwen
14B and 32B and even a lightweight Turbo model can outcompete GPT4-o-mini!

4/11
@Alibaba_Qwen
Qwen2.5-3B can learn way more knowledge than you might expect!

5/11
@Alibaba_Qwen
Play with all our Qwen2.5 LLMs in a single HF Space!
Qwen2.5 - a Hugging Face Space by Qwen

6/11
@Alibaba_Qwen
The prince of code LLM, Qwen2.5-Coder!

7/11
@Alibaba_Qwen
Break the limit, Qwen2.5-Math!

8/11
@Sentdex
A day early it seems! Epic release.

9/11
@altryne
Whoaaah there! What a massive release!
Congrats to yhr team for pulling this off!

Will dig in and chat about it tomorrow on the show!

https://nitter.poast.org/i/spaces/1LyxBgkXXMpKN

10/11
@yacineMTB
okay go sleep now

11/11
@Gopinath876
Impressive

1/2
Qwen 2.5 72B aka GPT4/ Sonnet 3.5 competitive model now available for free on Hugging Chat!

GO try it out now!

[Quoted tweet]
Open Source AI/ML is on fire today!

Multilingual (29) Qwen 2.5 just dropped w/ 128K context too! The 72B rivals Llama 3.1 405B and beats Mistral Large 2 (123B)

> Trained on an extensive dataset containing up to 18 trillion tokens

> It surpasses its predecessor, Qwen2, with significantly higher scores on MMLU (85+), HumanEval (85+), and MATH (80+) benchmarks

> Excels in instruction following, generating lengthy texts (over 8K tokens), and understanding structured data like tables. It also shows significant progress in generating structured outputs, particularly JSON.

> Supports over 29 languages, including major global languages, and can handle up to 128K tokens, with a text generation capacity of 8K tokens.

They release specialised models as well:

1. Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B

2. Qwen2.5-Coder: 1.5B, 7B, and 32B on the way

3. Qwen2.5-Math: 1.5B, 7B, and 72B.

Kudos to @Alibaba_Qwen team for shipping high quality model checkpoints!

2/2
HuggingChat

1/6
@_philschmid
GPT-4 for coding at home! Qwen 2.5 Coder 7B outperforms other @OpenAI GPT-4 0613 and open LLMs < 33B, including @BigCodeProject StartCoder, @MistralAI Codestral, or Deepseek, and is released under Apache 2.0.

Details:

Three model sizes: 1.5B, 7B, and 32B (coming soon) up to 128K tokens using YaRN

Pre-trained on 5.5 trillion tokens, post-trained on tens of millions example (no details on # tokens)

7:2:1 ratio of public code data, synthetic data, and text data outperformed other combinations, even those with more code proportion.

Build scalable synthetic data generation using LLM scorers, checklist-based scoring, and sandbox for code verification to filter out low-quality data.

Trained on 92+ programming languages and Incorporated multilingual code instruction data

To improve long context, create instruction pairs with FIM format using AST

Adopted a two-stage post-training process—starting with diverse, low-quality data (tens of millions) for broad learning, followed by high-quality data with rejection sampling for refinement (millions).

Performed decontamination on all datasets (pre & post) to ensure integrity using a 10-gram overlap method

7B Outperforms other open Code LLMs < 40B, including Mistral Codestral, or Deepseek

7B matches OpenAI GPT-4 0613 on various benchmarks

Released under Apache 2.0 and available on @huggingface

Models: Qwen2.5-Coder - a Qwen Collection
Paper: Paper page - Qwen2.5-Coder Technical Report

2/6
@andrey_cheptsov
Would be also great to have Claude include to the comparison

3/6
@0xSMW
Do you think the performance holds on real-world scenarios? My observation with the small open models is they struggle with longer prompts or context, making them more of a POC than something usable.

4/6
@joru1000
Thanks, there are open discussions about whether the model holds up to benchmarks for real coding scenarios, the Qwen team is serious and usually deliver, however, for some reason, (quantizations or else?), I do not reach good results and others report similar feedback

5/6
@chatgptdevx
Any API provider that supports QWen 2.5 Coder?

6/6
@yuhui_bear
In actual production environments, it performs better than any LLM below 20B Aider LLM Leaderboards

1/19
@_philschmid
We have GPT-4 for coding at home! I looked up @OpenAI GPT-4 0613 results for various benchmarks and compared them with @Alibaba_Qwen 2.5 7B coder.

> 15 months after the release of GPT-0613, we have an open LLM under Apache 2.0, which performs just as well.

> GPT-4 pricing is $30/$60 while a ~7-8B model is at $0.09/$0.09 that's a cost reduction of ~333-666x times, or if you run it on your machine, it's “free”.

Still Mindblown. Full post about Qwen 2.5 tomorrow. 🫡

2/19
@_philschmid

3/19
@nisten
the 8bit 1.5b is getting the usual questions right as well, while running locally on the phone.
Time to rethink the scaling laws lol

4/19
@S_yahrul123
Could you do a guide on how to run this for an LLM beginner?

5/19
@hallmark_nick
How does this compare with Sonnet-3.5?

6/19
@j6sp5r
It might be impossible to beat OpenAI at everything, but it's totally possible to beat OpenAI on specific problems.

OpenAI is trying to solve the general problem and they are good at it. But if we focus on specific problems, it's rather easy for us to surpass their performance in these areas. They cannot be better than us in all areas all the time. This is how we stand our ground.

7/19
@edalgomezn
What requirements does it ask for to run it locally?

8/19
@gpt_biz
"Sounds like Qwen 2.5 is really giving GPT-4 a run for its money, can't wait to see how it performs in real-world tasks!"

9/19
@nocodeaidev
Wow, that's incredible progress! Open source LLMs really are changing the game. Looking forward to your full post tomorrow!

10/19
@beniam1nzh
wait the full name of Qwen is Alibaba_Qwen?? thats my Chinese LLM model there

11/19
@undeservingfut
One of the nice side effects of the chip export ban is Chinese engineers are working hard on doing more with less and thus helping level the playing field for startups without a big GPU budget.

12/19
@1992Hikaro
What I know about Chinese junk, at most this would be a hard tricky working only on benchmark nothing else.

13/19
@garyfung
Compare with Claude 3.5 instead

14/19
@APIdeclare
This is so cool. Wonder if its possible to run with cursor or alternative?

15/19
@squizzster
Looks like BMB to me, Bench-Mark-Bias.
a ~7B model beats GPT-4 0613 at coding. (6m old only)
:=) !PLEASE BE RIGHT! :-)
I want you to be right.

16/19
@yar_vol
Annoying thing about AI is that now I am used to prompt O1 so all other models even my friend Sonnet 3.5 look so dumb..
Do you know of any reasonable OSS effort to reproduce o1?

17/19
@SimplyObjective
Must be true then. The benchmark gods have spoken.

18/19
@be_anon_always
We knew it and thats why we have been making local AI co-pilot Pyano - Price, Privacy and Personalised. Launching in five days.

19/19
@LOFI911
Tried 7b ver of this model in @LMStudioAI and when it first generated the code the code generated error and so when I asked additionl prompt it just generated the very same code it did the first time. Not impressed, unless its Lm studio app's fault.

bnew · Sep 23, 2024

1/20
@ai_for_success
Qwen 2.5 seems to be incredibly good at coding, and it's not just based on one post; there are benchmarks to back it up.
Is anyone else using it?

2/20
@ai_for_success
Benchmark from Abacus AI team

[Quoted tweet]
Open source Qwen 2.5 beats o1 models on coding

Qwen 2.5 scores higher than the o1 models on coding on Livebench AI

Qwen is just below Sonnet 3.5, and for an open-source mode, that is awesome!!

o1 is good at some hard coding but terrible at code completion problems and hallucinates a lot...

3/20
@ikristoph
I have done a bunch of tests. It’s ‘good’ because the the benchmarks are simple ( as in, the questions is compact as is the answer ). It’s less effective on tasks where the context is larger / multi turn.

If you have ongoing dialog Sonnet is far better. If you are a skilled software engineer that can give good direction o1 is unbeatable.

I’ve also had excellent results from Gemini for when the context is very large ( many project files )

4/20
@ai_for_success
Thanks for sharing the info man.

5/20
@brooksy4503
not yet, may try tonight

6/20
@ai_for_success
Do let us know..

7/20
@Shawnryan96
I am gonna put it to the test at work tomorrow and see what it’s got.

8/20
@ai_for_success
Cool.. do let us know

9/20
@RyanEls4
Interesting

First time I've heard of Qwen.

I will give it a try

10/20
@chinagaode
Honestly, I'm quite surprised by Qwen2.5's performance. My test results may differ from others'. The instruction-following and generation speed of Qwen2.5, especially the 72b version, can even exhibit slow, deliberate thinking. When generating responses to some specific prompts, I noticed it produces answers in a staggered manner. I've already integrated Qwen2.5 into many of my business operations.

11/20
@the_aiatlas
Qwen outperforms OpenAI‘s models at the benchmark. I am sure it is op

12/20
@ParsLuci2991
I just tried the 32b version and it couldn't even do the simple snake game. it's already very slow, it took almost half an hour :D. o1 did it in seconds in one go

13/20
@risphereeditor
I've been using Qwen 2.5 Coder 7B and I have to say that it's as good as GPT 4O Mini (HumanEval 88.8% and personal use). I think that ChatGPT Plus is still the better product for coders.

14/20
@victor_explore
Cursor needs to add this

15/20
@genesshk
Thank you for sharing your thoughts on Qwen 2.5. I have observed similar positive feedback regarding its coding capabilities and benchmarks. It appears to be a valuable tool for developers.

16/20
@zeng_wt
I didn't use Qwen for coding. But use it for everything else. It is really fast and the outcome of writing has no chatGPT vibe.

17/20
@Coolzippity
I only really have the VRAM for up to 14b

18/20
@ai_frontier_k
I've been experimenting with Qwen 2.5, it's indeed incredibly good at coding, the benchmarks speak for themselves. What are your favorite applications for it?

19/20
@frankflynn20016
So are we all agreed that the only viable use case for LLMs is code generation ?

Even with the ones not so great at code, it is quicker to fix imperfect code than it is to write perfect code raw.

Very limited user base that LLMs bring true value to.

20/20
@AI_Homelab
I tested Qwen 2.5 72b (not the code specific model). It performed incredibly poorly. I used q_8 ggufs in LM Studio. I also tested Mistral Large v2, deepseek 236b, Wizard 8*22b. In this domain each of them completely destroys it.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 23, 2024

The Intelligence Age

In the next couple of decades, we will be able to do things that would have seemed like magic to our grandparents.

ia.samaltman.com

The Intelligence Age

September 23, 2024

A vibrant, impressionistic landscape of a winding path that stretches towards the horizon, lined with colorful fields

In the next couple of decades, we will be able to do things that would have seemed like magic to our grandparents.

This phenomenon is not new, but it will be newly accelerated. People have become dramatically more capable over time; we can already accomplish things now that our predecessors would have believed to be impossible.

We are more capable not because of genetic change, but because we benefit from the infrastructure of society being way smarter and more capable than any one of us; in an important sense, society itself is a form of advanced intelligence. Our grandparents – and the generations that came before them – built and achieved great things. They contributed to the scaffolding of human progress that we all benefit from. AI will give people tools to solve hard problems and help us add new struts to that scaffolding that we couldn’t have figured out on our own. The story of progress will continue, and our children will be able to do things we can’t.

It won’t happen all at once, but we’ll soon be able to work with AI that helps us accomplish much more than we ever could without AI; eventually we can each have a personal AI team, full of virtual experts in different areas, working together to create almost anything we can imagine. Our children will have virtual tutors who can provide personalized instruction in any subject, in any language, and at whatever pace they need. We can imagine similar ideas for better healthcare, the ability to create any kind of software someone can imagine, and much more.

With these new abilities, we can have shared prosperity to a degree that seems unimaginable today; in the future, everyone’s lives can be better than anyone’s life is now. Prosperity alone doesn’t necessarily make people happy – there are plenty of miserable rich people – but it would meaningfully improve the lives of people around the world.

Here is one narrow way to look at human history: after thousands of years of compounding scientific discovery and technological progress, we have figured out how to melt sand, add some impurities, arrange it with astonishing precision at extraordinarily tiny scale into computer chips, run energy through it, and end up with systems capable of creating increasingly capable artificial intelligence.

This may turn out to be the most consequential fact about all of history so far. It is possible that we will have superintelligence in a few thousand days (!); it may take longer, but I’m confident we’ll get there.

How did we get to the doorstep of the next leap in prosperity?

In three words: deep learning worked.

In 15 words: deep learning worked, got predictably better with scale, and we dedicated increasing resources to it.

That’s really it; humanity discovered an algorithm that could really, truly learn any distribution of data (or really, the underlying “rules” that produce any distribution of data). To a shocking degree of precision, the more compute and data available, the better it gets at helping people solve hard problems. I find that no matter how much time I spend thinking about this, I can never really internalize how consequential it is.

There are a lot of details we still have to figure out, but it’s a mistake to get distracted by any particular challenge. Deep learning works, and we will solve the remaining problems. We can say a lot of things about what may happen next, but the main one is that AI is going to get better with scale, and that will lead to meaningful improvements to the lives of people around the world.

AI models will soon serve as autonomous personal assistants who carry out specific tasks on our behalf like coordinating medical care on your behalf. At some point further down the road, AI systems are going to get so good that they help us make better next-generation systems and make scientific progress across the board.

Technology brought us from the Stone Age to the Agricultural Age and then to the Industrial Age. From here, the path to the Intelligence Age is paved with compute, energy, and human will.

If we want to put AI into the hands of as many people as possible, we need to drive down the cost of compute and make it abundant (which requires lots of energy and chips). If we don’t build enough infrastructure, AI will be a very limited resource that wars get fought over and that becomes mostly a tool for rich people.

We need to act wisely but with conviction. The dawn of the Intelligence Age is a momentous development with very complex and extremely high-stakes challenges. It will not be an entirely positive story, but the upside is so tremendous that we owe it to ourselves, and the future, to figure out how to navigate the risks in front of us.

I believe the future is going to be so bright that no one can do it justice by trying to write about it now; a defining characteristic of the Intelligence Age will be massive prosperity.

Although it will happen incrementally, astounding triumphs – fixing the climate, establishing a space colony, and the discovery of all of physics – will eventually become commonplace. With nearly-limitless intelligence and abundant energy – the ability to generate great ideas, and the ability to make them happen – we can do quite a lot.

As we have seen with other technologies, there will also be downsides, and we need to start working now to maximize AI’s benefits while minimizing its harms. As one example, we expect that this technology can cause a significant change in labor markets (good and bad) in the coming years, but most jobs will change more slowly than most people think, and I have no fear that we’ll run out of things to do (even if they don’t look like “real jobs” to us today). People have an innate desire to create and to be useful to each other, and AI will allow us to amplify our own abilities like never before. As a society, we will be back in an expanding world, and we can again focus on playing positive-sum games.

Many of the jobs we do today would have looked like trifling wastes of time to people a few hundred years ago, but nobody is looking back at the past, wishing they were a lamplighter. If a lamplighter could see the world today, he would think the prosperity all around him was unimaginable. And if we could fast-forward a hundred years from today, the prosperity all around us would feel just as unimaginable.

bnew · Sep 23, 2024

1/11
@bindureddy
To CoT or Not CoT?

Great paper that shows chain-of-thought mainly helps math and symbolic reasoning

It only has marginal benefit on other tasks

On MMLU, directly generating the answer without CoT leads to almost identical accuracy as CoT unless the question or model's response contains an equals sign!

This paper proves what we are seeing with o1 models.

The extra cost and time during inference is not worth it in most situations but will help when it comes to hard reasoning prompts

2/11
@gpt_biz
This paper is a great read for anyone interested in optimizing AI performance for complex tasks, definitely worth a look!

3/11
@mysticaltech
But remember, Q* is not just CoT. it's A* graph search plus Q learning RL, used with CoTs.

4/11
@MonikaGope_
Fascinating takeaway!

5/11
@mikejchaves
Interesting. CoT primarily enhances math and symbolic reasoning but has marginal effects elsewhere

6/11
@0xAyush1
had applied this cot @calmemail

99% of the time the outputs were perfect and without errors

but the. problem came when the chat history size grew and it started outputting in non json and other weird formats.

7/11
@jtechbetter
I agree, CoT is not worth the extra cost and time for most tasks.

8/11
@adridder
CoT's utility seems task-dependent. Symbolic reasoning gains significantly while text tasks benefit marginally. Interesting tradeoff between transparency and performance to consider.

9/11
@RomeoLupascu
I'm sorry isn't CoT is CoG ... :smile:

10/11
@BensenHsu
The paper examines the effectiveness of a technique called "chain-of-thought" (CoT) for large language models (LLMs) in solving different types of reasoning tasks.

The results show that CoT provides substantial performance improvements primarily on tasks involving math or symbolic reasoning. On the MMLU dataset, CoT helps mainly on questions containing an "=" sign, which indicates the presence of equations and symbolic operations. CoT's main benefit comes from improving the execution of symbolic steps, but it still underperforms compared to using a dedicated symbolic solver.

full paper: TO COT OR NOT TO COT? CHAIN-OF-THOUGHT HELPS MAINLY ON MATH AND SYMBOLIC REASONING

11/11
@axel_pond
if you take a closer look at that figure it seems more appropriate to say that CoT covers the main weaknesses in LLMs and does not improve much where they are already doing well.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 23, 2024

1/1
The best way to apply o1 models is through an LLM router that automatically routes your query based on task and difficulty level.

Here is the best LLM router that you can design

sonnet - writing, coding
gpt4 - web search, doc generation
gemini - video understanding
o1 - complex reasoning and math

In addition, you can ask the o1 models to solve coding problems when sonnet fails.

Coming soon to ChatLLM

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 23, 2024

bnew · Sep 23, 2024

bnew · Sep 24, 2024

1/1
Dane Vahey of OpenAI says the cost per million tokens has fallen from $36 to $0.25 in the past 18 months, and as such AI is the greatest cost-depreciating technology ever invented

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 24, 2024

1/1
Exciting work from @huggingface on BitNet finetuning. We will share our latest progress on BitNet including model (pre-training) scaling, MoE, inference on CPUs, and more. Very soon, stay tuned! /search?q=#The_Era_of_1bit_LLMs /search?q=#BitNet
[Quoted tweet]

Exciting news! We’ve finally cracked the code for BitNet @huggingface ! no pre-training needed! With just fine-tuning a Llama 3 8B, we've achieved great results, reaching a performance close to Llama 1 & 2 7B models on key downstream tasks!

Want to learn more? Check out the blogpost or keep reading for exclusive insights!

Blogpost: huggingface.co/blog/1_58_llm…

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 24, 2024

1/3
@veryvanya
the first 1bit visionLM has arrived IntelLabs/LlavaOLMoBitnet1B · Hugging Face

[Quoted tweet]
Intel presents LLaVaOLMoBitnet1B

Ternary LLM goes Multimodal!

discuss: huggingface.co/papers/2408.1…

Multimodal Large Language Models (MM-LLMs) have seen significant advancements in the last year, demonstrating impressive performance across tasks. However, to truly democratize AI, models must exhibit strong capabilities and be able to run efficiently on small compute footprints accessible by most. Part of this quest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM capable of accepting Image(s)+Text inputs to produce coherent textual responses. The model is fully open-sourced along with training scripts to encourage further research in this space. This accompanying technical report highlights the training process, evaluation details, challenges associated with ternary models and future opportunities.

2/3
@ilumine_ai
what this can mean in the near term?

3/3
@veryvanya
how i see it
in few years, 10 year old iot potato compute will run multimodality faster and at better quality than current sota closed models (in narrow finetuned usecases)
this continued research into 1bit coupled with decentralised training are already sneakpeak of crazy future

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/18
@mohamedmekkouri

Exciting news! We’ve finally cracked the code for BitNet @huggingface ! no pre-training needed! With just fine-tuning a Llama 3 8B, we've achieved great results, reaching a performance close to Llama 1 & 2 7B models on key downstream tasks!

Want to learn more? Check out the blogpost or keep reading for exclusive insights!

Blogpost: Fine-tuning LLMs to 1.58bit: extreme quantization made easy

2/18
@mohamedmekkouri
1/ BitNet principle in a nutshell: BitNet is an architecture introduced by @MSFTResearch, it replaces traditional linear layers in multi-head attention and feed-forward networks with specialized BitLinear layers using ternary or binary precision.

Kudos to @realHongyu_Wang, @ma_shuming, and the entire team for this amazing technique!

papers :

[2310.11453] BitNet: Scaling 1-bit Transformers for Large Language Models

[2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

3/18
@mohamedmekkouri
2/ The papers proposed pre-training models from scratch using this architecture in the context of Quantization Aware Training (QAT) with fake quantization layers. This approach aims to make the model aware of quantization during training. However, pre-training requires significant resources, which aren't affordable for everyone. Our vision is to make these models accessible and open, encouraging greater community involvement and effort.
That's the primary reason we began exploring fine-tuning techniques!

4/18
@mohamedmekkouri
3/ The first reasonable step was to start with pre-trained weights (Llama3 8B), apply quantization, and evaluate performance. We attempted this, but the model lost all prior information with quantization. There was no significant difference between starting with random weights and pre-trained weights.

5/18
@mohamedmekkouri
4/ We then experimented with various techniques to achieve successful fine-tuning (on the same Llama3 8b model), but I'll skip the ones that didn't work. Let's focus on the most promising technique we discovered: Warmup Quantization. To grasp this concept, you need to understand how quantization is implemented in BitNet:
(for a detailed code explanation, check out the blogpost)

6/18
@mohamedmekkouri
5/ In the image above, we use both quantized and non-quantized values to address the issue of non-differentiable quantization operations. This inspired us to introduce quantization gradually into the model. We created a variable called lambda, which increases from 0 to 1. A lambda value of 0 means no quantization, while a value of 1 represents full quantization. We tried in our experiments different warmup steps, and different schedulers (Linear, Exponential, Sigmoid)

7/18
@mohamedmekkouri
6/ Our experiments show that the linear scheduler generally performs better, whereas the sigmoid scheduler excels in specific setups with a particular slope parameter. Below are the results on downstream tasks, evaluated using LightEval on the nanotron format of the checkpoints after a 10B tokens fine-tuning :

8/18
@mohamedmekkouri
7/ We scaled our experiments to fine-tune on 100 billion tokens. Except for Hellaswag and Winogrande, the model performs comparably to Llama2 7B and Llama 1 7B on downstream tasks!

9/18
@mohamedmekkouri
8/ To learn more, check out the blogpost

.

Remember, this is only the beginning of the Era of Extreme Quantization!

10/18
@iamRezaSayar
Super exciting work!

thanks for sharing and looking forward for more on this topic!

11/18
@bnjmn_marie
Incredible work!

12/18
@sladurb
Congrats! BitNets are also being merged in llama.cpp and llamafile , so we need decent trainded models and methods to distill existing LLMs into bitnets!

13/18
@iamRezaSayar
@nisten you might wanna see this

14/18
@abelsantillanr
@ollama Please!!!!

15/18
@GhostOfAnubis
I thought this technique had been forgotten.

16/18
@CyberAIWizard
@LMStudioAI

17/18
@Karenhalmi
That's fascinating, Mohamed! It's amazing how AI keeps evolving. I'd love to learn more about how BitNet works. Thanks for sharing this breakthrough!

18/18
@Jacoed

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/6
4 months since we released BitNet b1.58

After we compressed LLM to 1.58 bits, the inference of 1bit LLM is no longer memory-bound, but compute-bound.

Today we introduce Q-Sparse that can significantly speed up LLM computation.

2/6
Q-sparse is trained with TopK sparsification and STE to prevent from gradients vanishing.

Q-Sparse can achieve results comparable to those of baseline LLMs while being much more efficient at inference time;

3/6

We present an inference-optimal scaling law for sparsely-activated LLMs; As the total model size grows, the gap between sparsely-activated and dense model continuously narrows.

4/6

Q-Sparse is effective in different settings, including training-from-scratch, continue-training of off-the-shelf LLMs, and finetuning;

5/6

Q-Sparse works for both full-precision and 1-bit LLMs (e.g., BitNet b1.58). Particularly, the synergy of BitNet b1.58 and Q-Sparse (can be equipped with MoE) provides the cornerstone and a clear path to revolutionize the efficiency of future LLMs.

6/6
Link：[2407.10969] Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 24, 2024

1/2
From DeepNet, BitNet to Q-Sparse, our mission is to develop a 10x, even 100x efficient architecture without any compromise on performance.

With test-time scaling, more efficient inference also means better performance (given the same inference budget).

2/2
Currently we still focus on the weight and activation. KV Cache has significant redundancy between layers… it’s worth trying Q-Sparse for KV Cache, but it’s difficult to achieve such high-level compression compared to YOCO.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

JoelB · Sep 24, 2024

@bnew

bnew · Sep 24, 2024

1/52
@itsPaulAi
Microsoft has just announced Copilot's ability to process data in Excel by generating Python code.

- Request a detailed analysis
- Copilot generates Python
- Excel executes the code to show the result

All without having to write a single formula. Just natural language.

Huge deal for data analysis and visualization.

2/52
@heyAlfaiz
Millions worldwide use Excel daily. Copilot's Python code generation makes data analysis easier for all user levels.

3/52
@itsPaulAi
Absolutely. The market and the impact it's going to have are absolutely gigantic.

4/52
@EddieSalinas_
Wow this is big! Thanks for sharing Paul!

5/52
@itsPaulAi
It is! Will further simplify data processing!

6/52
@ScientistBuzz
It's like Code interpreter

7/52
@itsPaulAi
Yes, kind of!

8/52
@mikets808
Any chance this will improve weather forecasting?

9/52
@itsPaulAi
Good question! This seems to make forecasting new data a lot easier. So maybe it'll be useful for this use case.

10/52
@KanikaBK
Thanks for the heads up!

11/52
@itsPaulAi
You're welcome, Kanika!

12/52
@2bitdane
Evil whispers claim the only reason they introduced python in excel was because that was how they could get copilot working in it...

13/52
@itsPaulAi
Seems likely

14/52
@dr_cintas
Python code in Excel!?

15/52
@itsPaulAi
Yep! Was in beta but now they're really taking it to the next level by allowing you to generate the code with Copilot.

16/52
@FCamiade
Microsoft Copilot still in the race!

17/52
@itsPaulAi
Still very relevant for businesses!

18/52
@mhdfaran
Microsoft Copilot is so underrated

19/52
@itsPaulAi
Agree. For businesses already using Microsoft 365 this is a no brainer.

20/52
@critiqsai
This is so useful for Excel

21/52
@itsPaulAi
A big update indeed

22/52
@samuraipreneur
Data analysis is so much simpler for everyone now with the help of AI.

Great addition to Copilot this

23/52
@itsPaulAi
Yeah, this will greatly simplify workflows with data to process.

24/52
@inge_MBA_GWU_DC
I got Microsoft 365 license because of my university. Can I pay independently for adding a [Excel] Copilot license not linked to my Office account?

25/52
@itsPaulAi
No sure. I think your administrator should add Copilot for Microsoft 365 to your account.

26/52
@DataShaun
Excel will eat everything, eventually

27/52
@itsPaulAi
It already carries the entire economy

28/52
@arny_trezzi
This stuff makes me think /search?q=#PLTR is 100y ahead.

29/52
@FrankieS2727

30/52
@7alken2
will accept py again after they again change logo to something without yellow/blue;

31/52
@sachu1988
@rattibha please download this video

32/52
@rattibha
@sachu1988 Hi, Found the video, here's a video link.

Rattibha

feedbacks are welcome.

33/52
@MBRI37221
question is if there are any guardrails in place otherwise users could ask to write a macro to perform basically any action on your device (i.e. wipe disk).

34/52
@Ms_NicoleMurray
This is wonderful

35/52
@thestarktruth
That’s a pretty big deal!

36/52
@chrismilas
Wow that’s cool!

37/52
@lukejmorrison
No more Visual Basic Scripting (VBS)?

Good move @Microsoft

38/52
@LarryPanozzo
Sneaky trick Microsoft

It’ll be second-rate, but people won’t care because they don’t want to go learn the first-rate alternative and get used to a brand new tool

39/52
@arnill_dev
That's a game-changer! Making data handling a breeze.

40/52
@uttamkumaran
the medium is the answer"- recently started thinking through how financial modeling would be solved by LLMs. Lots of people think it's python notebooks but have never worked in finance to realize that bankers only work in excel. The medium has to be excel

41/52
@HighwayAcademy2
Can't wait to try this out. We recently added copilot to our subscription.
We have been waiting for detailed analysis. This can help one analyse youtube and revenue statistics in ways never imagined before.

42/52
@AscendantGus
But is logging in to Microsoft 365 still a nightmare?

43/52
@JafarNajafov
Yes, its gonna revolutionise the data analytics industry

44/52
@shushant_l
Business analysts would love this

45/52
@Excel_Hacks
Microsoft's announcement of Copilot's new capability to process data in Excel by generating Python code is a significant advancement in simplifying data analysis and automation. With Copilot generating Python code directly from natural language input, users can seamlessly execute complex data processing tasks without the need to write formulas. This integration between Copilot and Excel streamlines the data analysis process, allowing users to focus on insights rather than coding intricacies. The ability to harness the power of Python within Excel opens up new possibilities for efficient and intuitive data manipulation, marking a substantial leap forward in the realm of data analytics and automation.

The seamless integration of Copilot's Python code generation with Excel's execution capabilities represents a promising development for data analysts and professionals working with large datasets. By eliminating the need to manually write code, this innovation empowers users to effortlessly translate natural language instructions into actionable Python code, enhancing productivity and reducing the barriers to leveraging the full potential of data analysis. This advancement holds the potential to revolutionize the way data is processed within Excel, offering a more intuitive and efficient approach to data manipulation and analysis. Overall, this integration marks a significant step forward in simplifying data processing and empowering users to unlock the insights hidden within their datasets with greater ease and efficiency.

46/52
@nitin2050
@venusjain - our kids future

47/52
@mattoess
Pretty awesome.

48/52
@Haider_Khn_
Development in the office of Microsoft..

49/52
@FurzeNathan
The world runs on ‘export to csv’ for data analysis.

50/52
@JoeScot_7
It’s going to take so many jobs

51/52
@CasMilquetoast
Is this generally available?

52/52
@interzoid
Is it platform independent? Windows, Mac, etc.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

The A.I Megathread (LLM , GPT , Development)

More options

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

The Intelligence Age

The Intelligence Age

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

JoelB

All Praise To TMH

bnew

Veteran

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Veteran

Veteran

Veteran

The Intelligence Age​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

All Praise To TMH

Veteran

The Intelligence Age