bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214



[Submitted on 11 Jun 2024 (v1), last revised 13 Jun 2024 (this version, v2)]

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B​

Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang
This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and OlympiadBench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications.

Submission history​

From: Di Zhang [view email]
[v1] Tue, 11 Jun 2024 16:01:07 UTC (106 KB)
[v2] Thu, 13 Jun 2024 07:19:06 UTC (106 KB)






1/1
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B: A Technical Report


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GQuhD3AXsAAlI7j.jpg

GQuhUkJW0AAjRbc.jpg







1/6
It's finally here. Q* rings true. Tiny LLMs are as good at math as a frontier model.

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K!

That’s better than GPT-4, Claude and Gemini, with 200x less parameters!

2/6
Source: https://arxiv.org/pdf/2406.07394

3/6
I'd imagine these are the techniques code foundational model trainers are using, but I wonder

a) you're limited by the ability of the base open source model and might get it to be as good as a frontier model, but barely.
b) whether you can generate enough volume of synthetic code data with reasonable $$ spend.
c) If you are doing this on a 1T+ param model, can be prohibitively expensive

4/6
The (purported) technique isn’t tied to a particular model

5/6
Come on it's SLIGHTLY harder than that 😆

6/6
Shanghai AI lab made rumored Q* reality


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GQIOwEOasAEoq6D.jpg

GQIvQWqbwAAipu4.jpg



1/1
A prompt-level formula to add Search to LLM

🧵📖 Read of the day, day 85: Accessing GPT-4 level Mathematical Olympiad Solutions Via Monte Carlo Tree Self-refine with Llama-3 8B: A technical report, by Zhang et al from Shanghai Artificial Intelligence Laboratory

https://arxiv.org/pdf/2406.07394

The authors of this paper introduce a Monte-Carlo Tree Search like method to enhance model generation. They call it Monte-Carlo Tree Self-Refined, shortened as MCTSr.

Their method is based solely on prompting the model and does not modify its weight, yet greatly enhances the results.

How?
1- Generate a root node through naive answers or a dummy one
2- Use a value function Q to rank answers that were not expanded, select the best greedily
3- Optimize answer through generating a feedback, and then exploit it
4- Compute the Q value of the answer
5- Update value of parent nodes
6- Identify candidate nodes for expansion, and use UCT formula to update all nodes for iterating again
7- Iterate until max steps are reached

Value function Q is actually prompting the model to reward its answer. Model is prompted several times and its answers are averaged. Backpropagation and UCT formulas can be found within the paper.

The authors then evaluate 4 rollouts and 8 rollouts MCTSr on a Llama-3 8B and compare it to GPT-4, Claude 3 Opus and Gemini-1.5 Pro on mathematical problems.

They first find out such sampling greatly increases performances on both GSM8k and MATH datasets, reaching Frontier-models level of performances in GSM8k (still below in MATH, but greatly improved).

The authors then evaluate the models on harder benchmarks. MCTSr improves model performance across all of them. They notice that on Math Odyssey, the 8-rollout MCTSr is on the level of GPT-4 !

Prompts can be found within the appendix.
Code is open-sourced at: GitHub - trotsky1997/MathBlackBox

Personal Thoughts: While this research remains on preliminary stage, the results are quite impressive for results they get only by prompting. The fact a mere 8B model can reach frontier-levels of performance in benchmarks is nothing to laugh at. Still tells us there’s a lot of stuff to discover even solely with LLMs!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GQPHixkXAAAOs00.jpg

GQPHixtX0AAGk40.jpg
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

AI in practice

Aug 26, 2024

"Frontier action model" AI startup H loses three founders months after $220M round​


french_ai_startup_H.png

H (Screenshot Website)
Matthias Bastian
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile
E-Mail

French AI startup H, formerly known as Holistic, has seen three of its five co-founders depart just four months after its launch and a $220 million funding round.

Karl Tuyls, one of the departing founders, confirmed this to The Information. Along with Tuyls, who served as co-CEO, Chief Scientist Daan Wierstra and Multi-Agent Lead Julien Perolat have also left the company. All three had previously worked at Google's AI lab DeepMind.

Tuyls cited "operational and business disagreements" among the founders as the reason for their departure. He described the decision as a "very difficult decision for all involved." The two remaining co-founders are Charles Kantor, the other co-CEO of H, and Laurent Sifre, the Chief Technology Officer.

According to its website, H is developing "frontier action models" - AI models or agents that can perform tasks step-by-step and take actions, such as browsing the web or operating apps on your screen, without needing specific training just for that app. Some believe that such models will contribute to the next stage of AI development, but also that they're at least two to three years away from working reliably.

In May, H raised $220 million from investors including former Google CEO Eric Schmidt and LVMH CEO Bernard Arnault. The startup currently employs about 40 engineers and researchers.

AGI as a long-term goal​


CEO Charles Kantor, a former computational mathematics student at Stanford, said the company is working towards "full AGI" - referring to a type of superhuman AI that could be applied to a wide range of tasks and have significant economic impact.

In a LinkedIn post, H stated that following the decision to split, the co-founders "are in agreement that this will enable the company’s greatest success moving forward. H continues to have the full support of its investors and strategic partners."

After Mistral, H is the second well-funded AI startup from France by EU standards. However, unlike Mistral, it has yet to demonstrate its effectiveness. H has announced plans to launch models and products to market before the end of the year.

  • French AI startup H is losing three of its five co-founders due to "operational and business disagreements," just four months after launching and raising $220 million in funding.
  • The departing founders, Karl Tuyls, Daan Wierstra and Julien Perolat, had previously worked at Google's DeepMind AI lab. The remaining co-founders are Charles Kantor and Laurent Sifre.
  • H is working on "frontier action models," AI models or agents that can perform tasks step-by-step and act accordingly. The long-term goal is a "complete AGI". Despite the departure of its founders, H says it still has the support of its investors and plans to launch models and products before the end of the year.

Sources

Linkedin H Bloomberg The Information
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

AI in practice

Aug 28, 2024

Users claim Claude AI is getting dumber, Anthropic says it's not​



Users claim Claude AI is getting dumber, Anthropic says it's not

Anthropic / THE DECODER
Jonathan Kemper
Jonathan works as a technology journalist who focuses primarily on how easily AI can already be used today and how it can support daily life.
Profile

Users are reporting that Anthropic's Claude chatbot has become less capable recently, echoing similar complaints about ChatGPT last year. Anthropic says it hasn't made any changes, highlighting the challenges of maintaining consistent AI performance.

A Reddit post claiming "Claude absolutely got dumbed down recently" gained traction, with many users agreeing the chatbot's abilities have declined. The original poster said Claude now forgets tasks quickly and struggles with basic coding, prompting them to cancel their subscription.

Something is going on in the Web UI and I'm sick of being gaslit and told that it's not. Someone from Anthropic needs to investigate this because too many people are agreeing with me in the comments.

u/NextgenAITrading on Reddit
Image: Reddit/Screenshot by THE DECODER

Anthropic's Alex Albert responded, stating their investigation "does not show any widespread issues" and confirming they haven't altered the Claude 3.5 Sonnet model or inference pipeline.

We'd also like to confirm that we've made no changes to the 3.5 Sonnet model or inference pipeline. If you notice anything specific or replicable, please use the thumbs down button on Claude responses to let us know. That feedback is very helpful.

Alex Albert, Developer Relations at Anthropic

To increase transparency, Anthropic now publishes its system prompts for Claude models on its website. Similar complaints about Claude surfaced in April 2024, which Anthropic also denied. Some users later reported performance returning to normal.

Image: Reddit/Screenshot by THE DECODER

Recurring pattern of perceived AI degradation​


This pattern of users perceiving AI decline followed by company denials has occurred before, notably with ChatGPT in late 2023. Complaints about GPT-4 and GPT-4 Turbo persist today, even for the latest GPT-4o model.

Several factors may explain these perceived declines. Users often become accustomed to AI capabilities and develop unrealistic expectations over time. When ChatGPT launched in November 2022 using GPT-3.5, it initially impressed many. Now, GPT-3.5 appears outdated compared to GPT-4 and similar models.

Natural variability in AI outputs, temporary computing resource constraints, and occasional processing errors also play a role. In our daily use, even reliable prompts sometimes produce subpar results, though regenerating the response usually resolves the issue.

These factors can contribute to the perception of decreased performance even when no significant changes have been made to the underlying AI models. But even when the model is changed or updated, it's often not clear what exactly is going on.

OpenAI recently released an updated GPT-4o variant, stating that users "tend to prefer" the new model because they were unable to provide a concrete changelog. The company said it would like to provide more details about how model responses differ, but cannot due to a lack of advanced research into methods for granularly evaluating and communicating improvements in model behavior.

OpenAI has previously noted that AI behavior can be unpredictable, describing AI training, model tuning, and evaluation as an "artisanal, multi-person effort" rather than a "clean industrial process. Model updates can improve some areas while degrading others. This shows how difficult it is to maintain and communicate generative AI performance at scale.

Summary
  • Users are reporting that Anthropic's Claude chatbot has recently become less capable, with complaints that it quickly forgets tasks and struggles with basic coding, prompting some to cancel their subscriptions.
  • Anthropic has denied making any changes to the Claude 3.5 Sonnet model or inference pipeline, and is now publishing its system prompts for transparency. Similar complaints surfaced in April 2024 and with ChatGPT in late 2023.
  • Factors such as users becoming accustomed to AI capabilities, natural variability in results, temporary resource constraints, and processing errors can contribute to perceptions of performance degradation even without significant changes to the underlying models, highlighting the challenges of maintaining and communicating AI performance.

Sources

Reddit Reddit Reddit
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

AI in practice
Aug 22, 2024

New prompting method can help improve LLM reasoning skills​


Midjourney prompted by THE DECODER

New prompting method can help improve LLM reasoning skills


Chinese researchers have created a technique that enables large language models (LLMs) to recognize and filter out irrelevant information in text-based tasks, leading to significant improvements in their logical reasoning abilities.

The research team from Guilin University of Electronic Technology and other institutions developed the GSMIR dataset, which consists of 500 elementary school math problems intentionally injected with irrelevant sentences. GSMIR is derived from the existing GSM8K dataset.

Tests on GSMIR showed that GPT-3.5-Turbo and GPT-3.5-Turbo-16k could identify irrelevant information in up to 74.9% of cases. However, the models were unable to automatically exclude this information once it was detected before solving a task.

Recognizing and filtering irrelevant information - and only then responding​


To address this, the researchers developed the two-stage "Analysis to Filtration Prompting" (ATF) method. First, the model analyzes the task and identifies irrelevant information by examining each sub-sentence. It then filters out this information before starting the actual reasoning process.

The two-step ATF prompt process. First it analyzes, then it filters, and only then the model responds. | Image: Jiang et al.

Using ATF, the accuracy of LLMs in solving tasks with irrelevant information approached their performance on the original tasks without such distractions. The method worked with all tested prompting techniques.

The combination of ATF with "Chain-of-Thought Prompting" (COT) was particularly effective. For GPT-3.5-Turbo, accuracy increased from 50.2% without ATF to 74.9% with ATF – an improvement of nearly 25 percentage points.

Benchmark results comparing various prompting methods with and without ATF. The methods tested include standard, instructed, chain-of-thought (with and without examples), and least-to-most prompting. GSM8K-SLC represents the GSMIR data set without irrelevant information. The study presents two tables, although their differences are unclear. Most likely, the upper table shows results for GPT-3.5-Turbo-16k and the lower table shows results for GPT-3.5-Turbo, but the labeling is incorrect. Both tables show that ATF consistently improved accuracy across all prompting methods when solving data set tasks containing irrelevant information. | Image: Jiang et al.

The smallest improvement came when ATF was combined with Standard Prompting (SP), where accuracy increased by only 3.3 percentage points. The researchers suggest that this is because SP's accuracy on the original questions was already very low at 18.5%, with most errors likely due to calculation errors rather than irrelevant information.

Because the ATF method is specifically designed to reduce the impact of irrelevant information, but not to improve the general computational ability of LLMs, the effect of ATF in combination with SP was limited.

With other prompting techniques, such as COT, which better support LLMs in correctly solving reasoning tasks, ATF was able to improve performance more significantly because irrelevant information accounted for a larger proportion of errors.

The study has some limitations. Experiments were conducted only with GPT-3.5, and the researchers only examined tasks containing a single piece of irrelevant information. In real-world scenarios, problem descriptions may contain multiple confounding factors.

In approximately 15% of cases, irrelevant information was not recognized as such. More than half of these instances involved "weak irrelevant information" that did not impact the model's ability to arrive at the correct answer.

This suggests that ATF is most effective for "strong irrelevant information" that significantly interferes with the reasoning process. Only 2.2% of cases saw relevant information incorrectly classified as irrelevant.

Despite these limitations, the study shows that language models' logical reasoning abilities can be enhanced by filtering out irrelevant information through prompt engineering. While the ATF method could help LLMs better handle noisy real-world data, it does not address their fundamental weaknesses in logic.

Summary
  • Researchers at Guilin University of Electronic Technology have developed a technique that helps large language models (LLMs) identify and remove irrelevant information in text-based tasks, significantly improving their reasoning capabilities.
  • The two-step "Analysis to Filtration Prompting" (ATF) method first analyzes the task and identifies irrelevant information by examining each sub-sentence. It then filters out this information before the model begins its reasoning process. When combined with Chain-of-Thought Prompting (COT), the accuracy of GPT-3.5-Turbo improved by nearly 25 percentage points, from 50.2% to 74.9%.
  • The study has limitations. Only GPT-3.5 variants were tested, and the tasks each contained only one piece of irrelevant information. Real-world scenarios often involve multiple confounding factors.

Sources
Paper

Matthias Bastian


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214
AI in practice

Aug 21, 2024

OpenAI's latest fine-tuning update allows GPT-4o to learn your business inside and out​


Midjourney prompted by THE DECODER
OpenAI's latest fine-tuning update allows GPT-4o to learn your business inside and out


OpenAI now allows developers to fine-tune GPT-4o. This customization aims to enhance performance for specific use cases and reduce costs.

According to OpenAI, fine-tuning enables adjustments to response structure and tone, as well as the ability to follow complex domain-specific instructions. Significant improvements can be achieved with just a few dozen examples in the training dataset, the company says.

OpenAI showcases two applications of fine-tuning: Cosine's AI assistant Genie achieved a top score of 43.8 percent on the SWE-bench Verified Benchmark for software engineering. Distyl secured first place on the BIRD-SQL benchmark for text-to-SQL tasks, with their customized GPT-4o model reaching 71.83 percent accuracy.


Free training tokens available through September 23​


OpenAI emphasizes that developers retain full control over their tuned models, and according to the company, input data is only used to refine the developer's own model, not to train other models. However, OpenAI implements security measures to prevent model misuse and monitors tuned models for potential safety issues.

Fine-tuning is available to all developers with paid usage tiers. Costs are $25 per million tokens for training, $3.75 per million input tokens, and $15 per million output tokens for inference.

OpenAI also offers fine-tuning for GPT-4o mini. The company is offering two million free training tokens per day for GPT-4o mini until September 23, and one million free training tokens per day for GPT-4o until the same date.


Fine-tuning can help, but is not a cure-all​


Fine-tuning improves AI model performance and tailors them to specific tasks. It can enhance a model's understanding of content and extend its knowledge and capabilities for particular applications. OpenAI's new fine-tuning options are part of a broader initiative to customize AI models for businesses.

An independent test by data analytics platform Supersimple showed that a fine-tuned AI model can significantly improve performance on specific tasks compared to a standard model, though it's not perfect. Moreover, the performance boost from fine-tuning GPT-4 was smaller than the improvement seen when upgrading from GPT-3 to GPT-3.5.

Summary
  • OpenAI now allows developers to fine-tune GPT-4o with their own data. This should allow the structure and tone of responses to be adjusted and complex domain-specific instructions to be followed.
  • In two real-world examples, models tuned with GPT-4o achieved top scores on software engineering benchmarks (43.8% on SWE-bench Verified) and text-to-SQL tasks (71.83% on BIRD-SQL).
  • Fine-tuning is available to all developers at paid usage levels. Limited free contingents are available until September 23rd. Control over custom models remains with the developer. However, OpenAI uses security measures to prevent misuse.

Sources
OpenAI

Matthias Bastian
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

AI in practice

Aug 21, 2024

Microsoft releases new Phi 3.5 open-source language and vision models​


Midjourney prompted by THE DECODER

Microsoft releases new Phi 3.5 open-source language and vision models


Microsoft has released three new open-source AI models in its Phi series: mini-instruct, MoE-instruct, and vision-instruct. These models excel at LLM reasoning and support multiple languages, but have limitations in factual knowledge and safety.

Designed for commercial and scientific use, the Phi series generally aims to create highly efficient AI models using high-quality training data, although Microsoft hasn't yet shared details about the training process for Phi-3.5.

For the vision model, the company says it used "newly created synthetic, 'textbook-like' data for the purpose of teaching math, coding, common-sense reasoning, general knowledge of the world," in addition to other high-quality and filtered data.

Microsoft says these new models are ideal for applications with limited resources, time-sensitive scenarios, and tasks requiring strong logical reasoning within an LLM's capabilities.

The Phi-3.5-mini-instruct model, with 3.8 billion parameters, is optimized for low-resource environments. Despite its small size, it performs well in benchmarks, especially for multilingual tasks.

The Phi 3.5 MoE-instruct model has 16 experts, each with 3.8 billion parameters, for a total of 60.8 billion. However, only 6.6 billion parameters are active when using two experts, which is enough to match larger models in language comprehension and math, and to outperform some in reasoning tasks.

Image: Microsoft

It's often close to GPT-4o-mini performance, but keep in mind that these are just benchmarks, and word on the street is that Phi models have shown subpar real-world performance.

The Phi-3.5-vision-instruct model, a multimodal system with 4.2 billion parameters, can process text and images. It's suitable for tasks such as image understanding, OCR, and diagram understanding. It outperforms similarly sized models in benchmarks, and competes with larger models in multi-image processing and video summarization.

Image: Microsoft

Phi's context window gets an upgrade​


All Phi 3.5 models support a context length of up to 128,000 tokens, making it useful for long document summaries and multilingual context retrieval. It outperforms Google's Gemma 2 models, which are limited to 8,000 tokens.

However, like all LLMs, it is likely to suffer from the "lost in the middle" problem when processing large documents. This also applies to image processing.

The small size of the models limits their factual knowledge, according to Microsoft, potentially leading to higher than average inaccuracies. Microsoft suggests pairing Phi-3.5 with a search method such as RAG to address this weakness.

Like other language models, Phi models can produce biased or offensive output. They reject unwanted content in English, even when prompted in other languages, but are more vulnerable to complex prompt injection techniques in multiple languages.

The Phi 3.5 models are available under the MIT license on Hugging Face and through Microsoft's Azure AI Studio. They require specialized GPU hardware like NVIDIA A100, A6000, or H100 to support flash attention.

Summary
  • Microsoft has introduced three new open source AI models in its Phi 3.5 series: mini-instruct, MoE-instruct, and vision-instruct. The 3.5 models are designed for commercial and scientific use in multiple languages and have relatively high reasoning capabilities with the typical limitations of LLMs.
  • The smallest model, Phi-3.5-mini-instruct, has 3.8 billion parameters and is optimized for scenarios with limited computing power. The MoE-instruct model has 60.8 billion parameters, of which only 6.6 billion are active. The vision-instruct model can process text and images at GPT-4o level.
  • Due to their small size, the models have weaknesses in terms of factual knowledge and safety. Microsoft recommends combining them with a search system such as RAG to compensate for inaccuracies.

Sources
Hugging Face
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214









1/10
AI NEWS: OpenAI's Project Strawberry is reportedly coming this fall.

Plus, more developments from Anthropic, Google DeepMind, Apple, Cerebras, and xAI.

Here's everything going on in AI right now:

2/10
In a new report via The Information, OpenAI researchers are finally preparing to launch a new AI model, code-named Strawberry (previously Q*), this fall

If it lives up to leaks, it could potentially advance OpenAI towards Stage 2 of its five-level roadmap to AGI

3/10
Anthropic announced the full release of its Artifacts feature for all Claude users, including mobile apps.

This means users can now create Artifacts directly on their phones.

Can't wait to see what people build with it!

4/10
Google just released three new experimental Gemini 1.5 models:

- A smaller 8B parameter Flash model
- An updated Pro model
- An improved Flash model

The new 1.5 Pro now ranks as #2, and the new 1.5 Flash ranks as #6 on the Chatbot Arena leaderboard!

5/10
Apple announced a September 9 event at which it is expected to debut the iPhone 16 with new generative AI features.

While it's reported that Apple Intelligence will be delayed to fix bugs, we still don't know how much.

6/10
I'm hosting a live workshop this Friday with the founder of Lindy AI on how to build your own AI agent that responds to emails on your behalf

Join us if you want to see the demo live and ask any questions!

RSVP with the link in the comments of this post:

7/10
Cerebras introduced 'Cerebras Inference' -- a new tool for developers to access the startup's chips to run apps

The inference is reportedly ~20x faster than NVIDIA GPUs and ~2x faster than Groq, wild

8/10
xAI released new Grok features for premium subscribers on X.

New features include image generation suggestions, improved model selection in the iOS app, and more

The pace of xAI continues to impress!

9/10
As always, I’ll be sending out a more in-depth rundown on all the AI news, and why it actually matters in ~5 hours in my newsletter.

Join 650,000+ readers for free and never miss a thing in AI ever again: The Rundown AI

10/10
That's it for today's news in the world of AI.

I share what's happening in AI every day, follow me @rowancheung for more.

If you found this helpful, support me with a like/retweet on the first tweet of this thread:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWCti7IW8AAZ-bz.jpg

GWAj6UMaoAAsSHN.jpg



1/1
What Cerebras is doing is really interesting - ASIC chips dedicated to inference. This is going to likely put significant pressure on OpenAI, and continue the race to the bottom, and I also wonder how you scale a company making specialist hardware using dinner-plate-sized chips.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196











1/11
Introducing Cerebras Inference
‣ Llama3.1-70B at 450 tokens/s – 20x faster than GPUs
‣ 60c per M tokens – a fifth the price of hyperscalers
‣ Full 16-bit precision for full model accuracy
‣ Generous rate limits for devs
Try now: Cerebras Inference

2/11
Cerebras Inference is the fastest Llama3.1 inference API by far: 1,800 tokens/s for 8B and 450tokens/s for 70B. We are ~20x faster than NVIDA GPUs and ~2x faster than Groq.

3/11
Going from 90 tokens/s to 1,800 tokens/s is like going from dialup to broadband. It makes AI instant:

4/11
Cerebras Inference is just 10c per million tokens for 8B and 60c per million tokens for 70B. Our price-performance is so strong, we practically broke the chart on Artificial Analysis.

5/11
Inference today runs at tens of tokens per second instead of thousands due to the memory bandwidth limitation of GPUs. Generating one token requires sending the entire model parameters from memory to compute. Doing this a thousand times a second requires >100TB/s, far greater than what’s possible with HBM.

6/11
Cerebras solves the memory bandwidth bottleneck by building the largest chip in the world and storing the entire model on-chip. A Wafer Scale Engine has 44GB of on-chip SRAM with 21 petabytes/s of memory bandwidth, which lets us run Llama3.1 at over 1,800 tokens/s.

7/11
Cerebras inference isn’t just super-fast, it has huge throughput. With ~200x more on-chip memory than small AI chips, we can support batch sizes from 1 to 100, making it highly cost efficient to deploy at scale.

8/11
Cerebras Inference does not trade accuracy for speed. All our Llama3.1 models use Meta’s original 16-bit weights, ensuring the highest accuracy.

9/11
Lastly, we didn’t just build a fast demo – we have capacity to serve hundreds of billions of tokens per day to developers and enterprises. We will be adding new models (eg. Llama3.1-405B) and ramp even greater capacity in the coming months.

10/11
Try Cerebras inference for yourself:
Try Chat: Cerebras Inference
Free API key: Cerebras Developer Platform
Read Blog: Introducing Cerebras Inference: AI at Instant Speed - Cerebras

11/11



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWADATHXcAIFajH.jpg

GWADnE6XcAMpmaj.jpg

GWADpqDaoAQAhsB.jpg

GWADxUUbYAArquF.jpg

GWAD1l_aoAEHZ43.jpg

GWAD9MhWYAAd-ce.jpg


1/1
'Cerebras claims its solution is 20x faster than Nvidia's Hopper chips at AI inference, at a fraction of the price. "The way you beat the 800lb gorilla is by bringing a vastly better product to market. We've taken meaningful customers from Nvidia."' Subscribe to read


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
Have been following Groq but missed Cerebras. Anyone in my network using them for inference?


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWF-5NdbUAAvWzX.jpg



1/2
Cerebras Inference has the industry’s best pricing for high-speed inference

- 10c per million tokens for Llama3.1- 8B
- 60c per million tokens for Llama3.1- 70B

Try it today: Cerebras Inference

2/2
in or out


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWFfeknXcAAoaxM.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

1/2
Bland AI is one of the killer AI tools that has been flying under the radar.

Conversational AI is already being used for a ton of use cases at the enterprise level like sales, ops, customer support.

Imagine when the use cases expand to video games, apps, robotics, VR/AR, healthcare, etc.

We’re headed into a weird future where we’re going to probably talk to AI as much, if not more, than humans. And it’ll be impossible to tell the difference.

Also, try calling the AI agent on the website — it’s pretty uncanny.

2/2
Yes, we've even hosted workshops on it in The Rundown AI University


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/6
Today, marks a major milestone for us. We’ve closed our series A with $22M in funding. As we emerge from stealth, we wanted to formally introduce you to Bland, Your newest AI employee.

Bland is a customizable phone calling agent that sounds just like a human. It can:

🗣️Talk in any language or voice
🤖Be designed for any use case
☎️Handle millions of calls simultaneously. 24/7.

Bland does all of this without hallucination - and that’s just the start. But we'll let it speak for itself...

Call Now: Bland AI | Automate Phone Calls with Conversational AI for Enterprises

2/6
We hope that AI does the Bland work. Our goal is to make human interactions more meaningful and purposeful :smile:

3/6
huh

4/6
😊

5/6
Probably catch a few more GPUs on fire...

6/6
DM!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214




1/5
Chatbot Arena update⚡!

The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes!

Highlights:
- New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall!
- New Gemini-1.5-Pro (0827) shows strong gains in coding, math over previous versions.
- The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.

Big Congrats @GoogleDeepMind Gemini team on the incredible launch!

More plots in the followup posts👇

**Note: to better reflect community interests, older models nearing deprecation will soon be removed from the default leaderboard view.

2/5
Overall Leaderboard with /search?q=#votes and CIs:

3/5
Coding Arena: new Gemini-1.5-Pro improves significantly over previous versions.

4/5
The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.

5/5
Win-rate heatmap:

Check out full leaderboard at http://lmarena.ai/?leaderboard!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWAj6UMaoAAsSHN.jpg

GWAi50laoAAsNnt.jpg

GWAmKXKaoAAiLaF.jpg

GWAm-tsWoAAZ2a0.jpg

GWApDNAasAAowQY.jpg





1/4
Open-source is the future.

I just tested it. It's really good and an improved version for generating code.

2/4
have you tried it?

3/4
not in coding right now but not that behind.

4/4
ohh no, typo mistake :(


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214


1/2
Today, we're making Artifacts available for all Claude users. You can now also create and view Artifacts on the Claude iOS and Android apps.

Since launching in preview in June, tens of millions of Artifacts have been created. But where did it all begin?

Here's how we built it.

2/2
We look forward to seeing what you build and share next with Claude. Artifacts are now generally available


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214











1/12
AI NEWS: Salesforce just introduced two autonomous AI agents for sales.

Plus, more developments from Amazon, Boston Dynamics, Synthflow, AI21 Labs, NVIDIA, Mistral, D-ID, and Krea AI.

Here's everything going on in AI right now:

2/12
Salesforce just introduced two fully autonomous, AI-powered sales agents:

> Einstein SDR Agent can engage with inbound leads 24/7
> Einstein Sales Coach Agent that can offers real-time sales suggestions during calls

We're headed into a new era of sales.

3/12
Amazon CEO Andy Jassy shared an update on Q, the companies AI assistant for software development

This quote is all you need to know:

"We estimate this has saved us the equivalent of 4,500 developer-years of work (yes, that number is crazy but, real)"

4/12
Boston Dynamics posted a new video of its Atlas robot doing push-ups, showcasing advancements in dynamic movement control

Can't wait to see the Atlas vs. Figure vs. Optimus cage match

5/12
Synthflow introduced a 'Synthflow White Label,' a new white-label voice AI solution for agencies.

It allows users to customize the companies no-code AI voice platform to clients with full customizability.

6/12
AI21 Labs unveiled Jamba 1.5, a multilingual open AI model family.

It most impressively has a 256,000 context length, 2.5x faster long context in its size class, and permissive licensing for smaller orgs.

7/12
Nvidia and Mistral just released Mistral-NeMo-Minitron 8B, a small language model that can run on laptops and PCs

It outperforms Mistral-7B and Meta-LLama 3.1-8B on the Open LLM leaderboard

Small models are improving at an insane rate 👀

8/12
D-iD launched AI Video Translate, instant AI-generated multilingual videos

The feature clones the speakers voice and changes their lip movements to match the translated words in seconds

9/12
Krea AI added Flux 1, the new advanced text-to-image AI model, to its platform.

The integration comes with multiple styles, image variations, varying aspect ratios, and 3-minute free generations for non-subscribed users.

10/12
We're hiring at The Rundown!

We’re looking for another writer to join our team to help us write, edit, and distribute content on AI.

If you're interested, apply with the link in this thread:

11/12
As always, I’ll be sending out a more in-depth rundown on all the AI news, and why it actually matters in ~5 hours in my newsletter.

Join 650,000+ readers for free and never miss a thing in AI ever again: The Rundown AI

12/12
That's it for today's news in the world of AI.

I share what's happening in AI every day, follow me @rowancheung for more.

If you found this helpful, support me with a like/retweet on the first tweet of this thread:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GVlv6JgXgAASC4L.png

GVlwEUDXgAAW1_f.jpg

GVnN4gTXIAAmb3w.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

1/1
The Future of AI is Smart Architectures, Not Bigger Models

LLM developers are reaching their limit and the returns are getting smaller:

• 10x more data
• 10x more parameters
• 10x more training time

Yet, the improvements are minimal. The jump from GPT-3.5 to GPT-4 was huge, but no other LLM has matched that level of progress since.

Surprising, right?

Here's the catch - we've probably used up most of the good data already. Is anyone sitting on a dataset 10x bigger and more different than the entire Web? Not likely at all.

You can't multiply the training time or parameters by 10.
The costs are shocking. We’ll need entirely new architectures beyond the transformer, and that's still years away.

So, what's next?

It won’t be just LLMs. We're moving toward complex AI systems – agents, search during inference, and more.

If this is the case, "GPT-5" (if it’s even called that) might not just be an LLM. It could be a system that works well with these complex setups, similar to how MOE was created to manage inference costs.

The future isn’t about bigger language models. It’s about smarter AI architectures. This is where most of the economic benefits will come from going forward.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GV6JFWfWUAAwbrs.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

1/11
From @davidtsong on my team

Cost of 1M tokens has dropped from $180 to $0.75 in ~18 months (240X cheaper!)

2/11
I nice frame of reference is overlaying with the cost per bit over time for the Internet. Or the cost of compute during the rise of the microchip.

Very clearly a supercycle where marginal costs of a foundational resource are going to zero.

3/11
The commoditization of foundational models now has proof.

Today anyone can build a startup/business within a day - more important to now focus on real world problems.

Key is to have a differentiated yet defensible use case (not easily replicable) - long term.

4/11
😎

5/11
In reality there’s a room to make it X10000 cheaper

6/11
You want your ventures to bet that this trend will continue for a long time to come

7/11
CA?

8/11
The chart would be better without GPT-4o mini. Sharing a name doesn’t make it a GPT 4 level model.

9/11
Artificial Intelligence is becoming abundant and affordable

10/11
yeehaw 🤠

11/11
@davidtsong 🔥


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GVynALta8AEoZcP.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

1/1
Current AI systems have limited Reasoning abilities, but they can improve

Yes, current systems do reason.

Many people argue that what they do "isn't reasoning," but the difference between fake and "real" reasoning doesn't seem very important.

Their reasoning has limits, especially when multiple steps build on each other. Sometimes, outputs are given without leading to hallucinations.
Even worse, if asked to explain a hallucinated answer, the system may create a made-up reasoning for it.

These limits on reasoning ability, failing to apply reasoning when needed, and generating false reasoning are major challenges for LLMs.

Despite this, there’s optimism. Instead of needing new methods, frameworks like COT, tree search, and society of minds could greatly improve reasoning in existing LLMs.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GVvci8uWcAA2F4g.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,480
Reputation
8,519
Daps
160,214

1/6
LLM Pruning and Distillation in Practice: The Minitron Approach

abs: [2408.11796] LLM Pruning and Distillation in Practice: The Minitron Approach
models:
nvidia/Mistral-NeMo-Minitron-8B-Base · Hugging Face
nvidia/Llama-3.1-Minitron-4B-Width-Base · Hugging Face
nvidia/Llama-3.1-Minitron-4B-Depth-Base · Hugging Face

Compressing Llama 3.1 8B and Mistral NeMo 12B to 4B and 8B, respectively, with teacher correction, weight pruning, and distillation (Minitron approach from NVIDIA).

2/6
This one is trending on @aimodelsfyi this morning

3/6
The paper presents a comprehensive report on compressing two large language models - Llama 3.1 8B and Mistral NeMo 12B - using pruning and distillation techniques. The goal is to create smaller models with comparable performance while reducing the cost of training.

1. The compressed models, Llama-3.1-Minitron-4B and MN-Minitron-8B, exhibit strong performance on various benchmarks compared to similarly-sized models.
2. The width-pruned Llama-3.1-Minitron-4B model outperforms the depth-pruned variant.
3. The MN-Minitron-8B model provides an average speedup of 1.2x over the teacher Mistral NeMo 12B model, while the Llama-3.1-Minitron-4B models provide 2.7x and 1.8x speedups for the depth and width-pruned variants, respectively.

full paper: LLM Pruning and Distillation in Practice: The Minitron Approach

4/6
AI Summary: The paper details the process of compressing Llama 3.1 8B and Mistral NeMo 12B models to 4B and 8B parameters through pruning and distillation techniques. It evaluates two pruning strategies: dep...
LLM Pruning and Distillation in Practice: The Minitron Approach

5/6
added here @NVIDIAAI : llm-course/nvidia-nim.md at main · andysingal/llm-course

6/6
LLM Pruning and Distillation in Practice: The Minitron Approach


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GVjhjERbgAAsA6f.jpg

GVmRizzbgAgNH26.jpg

GVwm7rba8AMjJwP.png
 
Top