Large Language Models News & Discussions

bnew · Aug 28, 2024

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model

OpenAI is developing two new AI models that could significantly advance the field. "Strawberry" aims to solve complex math and programming problems better than current systems, while "Orion" aims to surpass GPT-4's capabilities with the help of Strawberry.

the-decoder.com

AI in practice

Aug 27, 2024

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model

Midjourney prompted by THE DECODER

Matthias Bastian
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Profile
E-Mail

OpenAI is developing two new AI models that could significantly advance the field. "Strawberry" aims to solve complex math and programming problems better than current systems, while "Orion" aims to surpass GPT-4's capabilities with the help of Strawberry.

According to The Information, citing two people involved in the project, OpenAI might release a chatbot version of Strawberry as early as this fall, possibly as part of ChatGPT.

Strawberry is designed to tackle previously unseen math problems and optimize programming tasks. Its enhanced logic should allow it to solve language-related challenges more effectively when given sufficient time to "think."

Agent-based AI systems based on Strawberry

In internal demonstrations, Strawberry reportedly solved the New York Times word puzzle "Connections." The model could also serve as a foundation for more advanced AI systems capable of not just generating content, but taking action.

Reuters reported that OpenAI has already tested an AI internally that scored over 90 percent on the MATH benchmark, a collection of math mastery tasks. This is likely Strawberry, which has also been presented to national security officials, according to The Information.

Internal OpenAI documents describe plans to use Strawberry models for autonomous internet searches, enabling the AI to plan ahead and conduct in-depth research.

The Information notes that it's uncertain whether Strawberry will launch this year. If released, it would be a distilled version of the original model, delivering similar performance with less computational power – a technique OpenAI has also used for GPT-4 variants since the original model was released in March 2023.

OpenAI's approach reportedly resembles the "Self-Taught Reasoner" (STaR) method introduced by Stanford researchers, which aims to improve AI systems' reasoning abilities.

Former OpenAI chief researcher Ilya Sutskever, who has since founded his own startup focused on secure super AI, is said to have provided the idea and basis for Strawberry.

bnew · Aug 28, 2024

German AI startup Aleph Alpha unveils new AI stack "Pharia AI" and new language models

German AI specialist Aleph Alpha has introduced Pharia AI, a comprehensive software stack designed to help enterprises and government agencies develop and operate AI applications with confidence and future proofing.

the-decoder.com

AI in practice

Aug 26, 2024

German AI startup Aleph Alpha unveils new AI stack "Pharia AI" and new language models

Aleph Alpha

Maximilian Schreiner
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail

German AI specialist Aleph Alpha has introduced Pharia AI, a comprehensive software stack designed to help enterprises and government agencies develop and operate AI applications with confidence and future proofing.

According to founder Jonas Andrulis, the goal is to provide customers with a complete solution for developing AI applications from the initial concept to production deployment.

Pharia AI is made up of several components:
- Pharia Catch assists subject matter experts in structuring and storing their knowledge for AI development.
- Pharia Studio guides developers through the process of creating application-specific AI systems from this knowledge and pre-trained models.
- Pharia OS handles the operation and scaling of these systems, including access control and monitoring.
- The Pharia Assistant provides a simple interface for employees to utilize the AI functions.

Aleph Alpha stresses that the stack should offer customers sovereignty and future security. The systems can be flexibly operated in the cloud or on-premises and trained with their own data. Additionally, customers should always have access to the latest AI innovations, whether open source models or Aleph Alpha's own developments.

These innovations include a method that allows language models to be more efficiently adapted to new languages and specialist areas without compromising performance in the source languages. Such innovations stem from Aleph Alpha's collaboration with researchers, such as those at the Technical University of Darmstadt. The company is also working on a way for customers to refine the behavior of the language models themselves and offers Explainable AI functions. All Pharia AI features are set to roll out in the coming months, with some already being used by select customers.

Alongside the core stack, the company also provides industry-specific solutions for sectors like the public sector, banks, and insurance companies. One example is Creance AI, a joint venture with PwC that helps financial institutions automatically check contracts for regulatory requirements.

Aleph Alpha sees partners as a key success factor for the dissemination of its technology. Platinum partners like IT service provider Materna and PwC support customers in the implementation of AI projects based on Pharia AI.

New Pharia-1-LLM language models published

Coinciding with the launch of the Pharia 1 language model family, which comprises 7 billion parameters, Aleph Alpha has also published new basic models, including training code and detailed information on data and capabilities. Pharia-1-LLM-7B-control can be flexibly adapted to user preferences, while the model behavior of Pharia-1-LLM-7B-control-aligned has been optimized for dealing with sensitive topics. Both models are trained in seven languages, with special optimization for English, German, French, and Spanish. The models are tailored to short, concise answers and - according to Aleph Alpha - are on par with the latest open source language models in the 7 to 8 billion parameter range. According to Aleph Alpha, they have been fully trained in accordance with applicable EU and national laws, making them suitable for corporate use.

Chief Research Officer Yasser Jadidi states that model size is not the decisive factor, and efficiency and domain-specific optimization are more important. However, the company is not ruling out the possibility of offering larger models in the future. The Pharia-1 models are available on Hugging Face.

bnew · Aug 28, 2024

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs...

arxiv.org

[Submitted on 11 Jun 2024 (v1), last revised 13 Jun 2024 (this version, v2)]

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and OlympiadBench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.07394 [cs.AI]
	(or arXiv:2406.07394v2 [cs.AI] for this version)
	[2406.07394] Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Submission history

From: Di Zhang [view email]
[v1] Tue, 11 Jun 2024 16:01:07 UTC (106 KB)
[v2] Thu, 13 Jun 2024 07:19:06 UTC (106 KB)

https://arxiv.org/pdf/2406.07394

1/1
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B: A Technical Report

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/6
It's finally here. Q* rings true. Tiny LLMs are as good at math as a frontier model.

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K!

That’s better than GPT-4, Claude and Gemini, with 200x less parameters!

2/6
Source: https://arxiv.org/pdf/2406.07394

3/6
I'd imagine these are the techniques code foundational model trainers are using, but I wonder

a) you're limited by the ability of the base open source model and might get it to be as good as a frontier model, but barely.
b) whether you can generate enough volume of synthetic code data with reasonable $$ spend.
c) If you are doing this on a 1T+ param model, can be prohibitively expensive

4/6
The (purported) technique isn’t tied to a particular model

5/6
Come on it's SLIGHTLY harder than that

6/6
Shanghai AI lab made rumored Q* reality

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
A prompt-level formula to add Search to LLM

Read of the day, day 85: Accessing GPT-4 level Mathematical Olympiad Solutions Via Monte Carlo Tree Self-refine with Llama-3 8B: A technical report, by Zhang et al from Shanghai Artificial Intelligence Laboratory

https://arxiv.org/pdf/2406.07394

The authors of this paper introduce a Monte-Carlo Tree Search like method to enhance model generation. They call it Monte-Carlo Tree Self-Refined, shortened as MCTSr.

Their method is based solely on prompting the model and does not modify its weight, yet greatly enhances the results.

How?
1- Generate a root node through naive answers or a dummy one
2- Use a value function Q to rank answers that were not expanded, select the best greedily
3- Optimize answer through generating a feedback, and then exploit it
4- Compute the Q value of the answer
5- Update value of parent nodes
6- Identify candidate nodes for expansion, and use UCT formula to update all nodes for iterating again
7- Iterate until max steps are reached

Value function Q is actually prompting the model to reward its answer. Model is prompted several times and its answers are averaged. Backpropagation and UCT formulas can be found within the paper.

The authors then evaluate 4 rollouts and 8 rollouts MCTSr on a Llama-3 8B and compare it to GPT-4, Claude 3 Opus and Gemini-1.5 Pro on mathematical problems.

They first find out such sampling greatly increases performances on both GSM8k and MATH datasets, reaching Frontier-models level of performances in GSM8k (still below in MATH, but greatly improved).

The authors then evaluate the models on harder benchmarks. MCTSr improves model performance across all of them. They notice that on Math Odyssey, the 8-rollout MCTSr is on the level of GPT-4 !

Prompts can be found within the appendix.
Code is open-sourced at: GitHub - trotsky1997/MathBlackBox

Personal Thoughts: While this research remains on preliminary stage, the results are quite impressive for results they get only by prompting. The fact a mere 8B model can reach frontier-levels of performance in benchmarks is nothing to laugh at. Still tells us there’s a lot of stuff to discover even solely with LLMs!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

New prompting method can help improve LLM reasoning skills

Chinese researchers have created a technique that enables large language models (LLMs) to recognize and filter out irrelevant information in text-based tasks, leading to significant improvements in their logical reasoning abilities.

the-decoder.com

AI in practice
Aug 22, 2024

New prompting method can help improve LLM reasoning skills

Midjourney prompted by THE DECODER

Chinese researchers have created a technique that enables large language models (LLMs) to recognize and filter out irrelevant information in text-based tasks, leading to significant improvements in their logical reasoning abilities.

The research team from Guilin University of Electronic Technology and other institutions developed the GSMIR dataset, which consists of 500 elementary school math problems intentionally injected with irrelevant sentences. GSMIR is derived from the existing GSM8K dataset.

Tests on GSMIR showed that GPT-3.5-Turbo and GPT-3.5-Turbo-16k could identify irrelevant information in up to 74.9% of cases. However, the models were unable to automatically exclude this information once it was detected before solving a task.

Recognizing and filtering irrelevant information - and only then responding

To address this, the researchers developed the two-stage "Analysis to Filtration Prompting" (ATF) method. First, the model analyzes the task and identifies irrelevant information by examining each sub-sentence. It then filters out this information before starting the actual reasoning process.

The two-step ATF prompt process. First it analyzes, then it filters, and only then the model responds. | Image: Jiang et al.

Using ATF, the accuracy of LLMs in solving tasks with irrelevant information approached their performance on the original tasks without such distractions. The method worked with all tested prompting techniques.

The combination of ATF with "Chain-of-Thought Prompting" (COT) was particularly effective. For GPT-3.5-Turbo, accuracy increased from 50.2% without ATF to 74.9% with ATF – an improvement of nearly 25 percentage points.

Benchmark results comparing various prompting methods with and without ATF. The methods tested include standard, instructed, chain-of-thought (with and without examples), and least-to-most prompting. GSM8K-SLC represents the GSMIR data set without irrelevant information. The study presents two tables, although their differences are unclear. Most likely, the upper table shows results for GPT-3.5-Turbo-16k and the lower table shows results for GPT-3.5-Turbo, but the labeling is incorrect. Both tables show that ATF consistently improved accuracy across all prompting methods when solving data set tasks containing irrelevant information. | Image: Jiang et al.

The smallest improvement came when ATF was combined with Standard Prompting (SP), where accuracy increased by only 3.3 percentage points. The researchers suggest that this is because SP's accuracy on the original questions was already very low at 18.5%, with most errors likely due to calculation errors rather than irrelevant information.

Because the ATF method is specifically designed to reduce the impact of irrelevant information, but not to improve the general computational ability of LLMs, the effect of ATF in combination with SP was limited.

With other prompting techniques, such as COT, which better support LLMs in correctly solving reasoning tasks, ATF was able to improve performance more significantly because irrelevant information accounted for a larger proportion of errors.

The study has some limitations. Experiments were conducted only with GPT-3.5, and the researchers only examined tasks containing a single piece of irrelevant information. In real-world scenarios, problem descriptions may contain multiple confounding factors.

In approximately 15% of cases, irrelevant information was not recognized as such. More than half of these instances involved "weak irrelevant information" that did not impact the model's ability to arrive at the correct answer.

This suggests that ATF is most effective for "strong irrelevant information" that significantly interferes with the reasoning process. Only 2.2% of cases saw relevant information incorrectly classified as irrelevant.

Despite these limitations, the study shows that language models' logical reasoning abilities can be enhanced by filtering out irrelevant information through prompt engineering. While the ATF method could help LLMs better handle noisy real-world data, it does not address their fundamental weaknesses in logic.

Summary

Researchers at Guilin University of Electronic Technology have developed a technique that helps large language models (LLMs) identify and remove irrelevant information in text-based tasks, significantly improving their reasoning capabilities.
The two-step "Analysis to Filtration Prompting" (ATF) method first analyzes the task and identifies irrelevant information by examining each sub-sentence. It then filters out this information before the model begins its reasoning process. When combined with Chain-of-Thought Prompting (COT), the accuracy of GPT-3.5-Turbo improved by nearly 25 percentage points, from 50.2% to 74.9%.
The study has limitations. Only GPT-3.5 variants were tested, and the tasks each contained only one piece of irrelevant information. Real-world scenarios often involve multiple confounding factors.

Sources
Paper

Matthias Bastia n

bnew · Aug 28, 2024

AI in practice

Aug 21, 2024

OpenAI's latest fine-tuning update allows GPT-4o to learn your business inside and out

Midjourney prompted by THE DECODER

OpenAI now allows developers to fine-tune GPT-4o. This customization aims to enhance performance for specific use cases and reduce costs.

According to OpenAI, fine-tuning enables adjustments to response structure and tone, as well as the ability to follow complex domain-specific instructions. Significant improvements can be achieved with just a few dozen examples in the training dataset, the company says.

OpenAI showcases two applications of fine-tuning: Cosine's AI assistant Genie achieved a top score of 43.8 percent on the SWE-bench Verified Benchmark for software engineering. Distyl secured first place on the BIRD-SQL benchmark for text-to-SQL tasks, with their customized GPT-4o model reaching 71.83 percent accuracy.

Free training tokens available through September 23

OpenAI emphasizes that developers retain full control over their tuned models, and according to the company, input data is only used to refine the developer's own model, not to train other models. However, OpenAI implements security measures to prevent model misuse and monitors tuned models for potential safety issues.

Fine-tuning is available to all developers with paid usage tiers. Costs are $25 per million tokens for training, $3.75 per million input tokens, and $15 per million output tokens for inference.

OpenAI also offers fine-tuning for GPT-4o mini. The company is offering two million free training tokens per day for GPT-4o mini until September 23, and one million free training tokens per day for GPT-4o until the same date.

Fine-tuning can help, but is not a cure-all

Fine-tuning improves AI model performance and tailors them to specific tasks. It can enhance a model's understanding of content and extend its knowledge and capabilities for particular applications. OpenAI's new fine-tuning options are part of a broader initiative to customize AI models for businesses.

An independent test by data analytics platform Supersimple showed that a fine-tuned AI model can significantly improve performance on specific tasks compared to a standard model, though it's not perfect. Moreover, the performance boost from fine-tuning GPT-4 was smaller than the improvement seen when upgrading from GPT-3 to GPT-3.5.

Summary

OpenAI now allows developers to fine-tune GPT-4o with their own data. This should allow the structure and tone of responses to be adjusted and complex domain-specific instructions to be followed.
In two real-world examples, models tuned with GPT-4o achieved top scores on software engineering benchmarks (43.8% on SWE-bench Verified) and text-to-SQL tasks (71.83% on BIRD-SQL).
Fine-tuning is available to all developers at paid usage levels. Limited free contingents are available until September 23rd. Control over custom models remains with the developer. However, OpenAI uses security measures to prevent misuse.

Sources
OpenAI

Matthias Bastian

bnew · Aug 28, 2024

Microsoft releases new Phi 3.5 open-source language and vision models

Microsoft has released three new open-source AI models in its Phi series: mini-instruct, MoE-instruct, and vision-instruct. These models excel at LLM reasoning and support multiple languages, but have limitations in factual knowledge and safety.

the-decoder.com

AI in practice

Aug 21, 2024

Microsoft releases new Phi 3.5 open-source language and vision models

Midjourney prompted by THE DECODER

Microsoft has released three new open-source AI models in its Phi series: mini-instruct, MoE-instruct, and vision-instruct. These models excel at LLM reasoning and support multiple languages, but have limitations in factual knowledge and safety.

Designed for commercial and scientific use, the Phi series generally aims to create highly efficient AI models using high-quality training data, although Microsoft hasn't yet shared details about the training process for Phi-3.5.

For the vision model, the company says it used "newly created synthetic, 'textbook-like' data for the purpose of teaching math, coding, common-sense reasoning, general knowledge of the world," in addition to other high-quality and filtered data.

Microsoft says these new models are ideal for applications with limited resources, time-sensitive scenarios, and tasks requiring strong logical reasoning within an LLM's capabilities.

The Phi-3.5-mini-instruct model, with 3.8 billion parameters, is optimized for low-resource environments. Despite its small size, it performs well in benchmarks, especially for multilingual tasks.

The Phi 3.5 MoE-instruct model has 16 experts, each with 3.8 billion parameters, for a total of 60.8 billion. However, only 6.6 billion parameters are active when using two experts, which is enough to match larger models in language comprehension and math, and to outperform some in reasoning tasks.

Image: Microsoft

It's often close to GPT-4o-mini performance, but keep in mind that these are just benchmarks, and word on the street is that Phi models have shown subpar real-world performance.

The Phi-3.5-vision-instruct model, a multimodal system with 4.2 billion parameters, can process text and images. It's suitable for tasks such as image understanding, OCR, and diagram understanding. It outperforms similarly sized models in benchmarks, and competes with larger models in multi-image processing and video summarization.

Image: Microsoft

Phi's context window gets an upgrade

All Phi 3.5 models support a context length of up to 128,000 tokens, making it useful for long document summaries and multilingual context retrieval. It outperforms Google's Gemma 2 models, which are limited to 8,000 tokens.

However, like all LLMs, it is likely to suffer from the "lost in the middle" problem when processing large documents. This also applies to image processing.

The small size of the models limits their factual knowledge, according to Microsoft, potentially leading to higher than average inaccuracies. Microsoft suggests pairing Phi-3.5 with a search method such as RAG to address this weakness.

Like other language models, Phi models can produce biased or offensive output. They reject unwanted content in English, even when prompted in other languages, but are more vulnerable to complex prompt injection techniques in multiple languages.

The Phi 3.5 models are available under the MIT license on Hugging Face and through Microsoft's Azure AI Studio. They require specialized GPU hardware like NVIDIA A100, A6000, or H100 to support flash attention.

Summary

Microsoft has introduced three new open source AI models in its Phi 3.5 series: mini-instruct, MoE-instruct, and vision-instruct. The 3.5 models are designed for commercial and scientific use in multiple languages and have relatively high reasoning capabilities with the typical limitations of LLMs.
The smallest model, Phi-3.5-mini-instruct, has 3.8 billion parameters and is optimized for scenarios with limited computing power. The MoE-instruct model has 60.8 billion parameters, of which only 6.6 billion are active. The vision-instruct model can process text and images at GPT-4o level.
Due to their small size, the models have weaknesses in terms of factual knowledge and safety. Microsoft recommends combining them with a search system such as RAG to compensate for inaccuracies.

Sources
Hugging Face

bnew · Aug 28, 2024

1/10
AI NEWS: OpenAI's Project Strawberry is reportedly coming this fall.

Plus, more developments from Anthropic, Google DeepMind, Apple, Cerebras, and xAI.

Here's everything going on in AI right now:

2/10
In a new report via The Information, OpenAI researchers are finally preparing to launch a new AI model, code-named Strawberry (previously Q*), this fall

If it lives up to leaks, it could potentially advance OpenAI towards Stage 2 of its five-level roadmap to AGI

3/10
Anthropic announced the full release of its Artifacts feature for all Claude users, including mobile apps.

This means users can now create Artifacts directly on their phones.

Can't wait to see what people build with it!

4/10
Google just released three new experimental Gemini 1.5 models:

- A smaller 8B parameter Flash model
- An updated Pro model
- An improved Flash model

The new 1.5 Pro now ranks as #2, and the new 1.5 Flash ranks as #6 on the Chatbot Arena leaderboard!

5/10
Apple announced a September 9 event at which it is expected to debut the iPhone 16 with new generative AI features.

While it's reported that Apple Intelligence will be delayed to fix bugs, we still don't know how much.

6/10
I'm hosting a live workshop this Friday with the founder of Lindy AI on how to build your own AI agent that responds to emails on your behalf

Join us if you want to see the demo live and ask any questions!

RSVP with the link in the comments of this post:

7/10
Cerebras introduced 'Cerebras Inference' -- a new tool for developers to access the startup's chips to run apps

The inference is reportedly ~20x faster than NVIDIA GPUs and ~2x faster than Groq, wild

8/10
xAI released new Grok features for premium subscribers on X.

New features include image generation suggestions, improved model selection in the iOS app, and more

The pace of xAI continues to impress!

9/10
As always, I’ll be sending out a more in-depth rundown on all the AI news, and why it actually matters in ~5 hours in my newsletter.

Join 650,000+ readers for free and never miss a thing in AI ever again: The Rundown AI

10/10
That's it for today's news in the world of AI.

I share what's happening in AI every day, follow me @rowancheung for more.

If you found this helpful, support me with a like/retweet on the first tweet of this thread:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
What Cerebras is doing is really interesting - ASIC chips dedicated to inference. This is going to likely put significant pressure on OpenAI, and continue the race to the bottom, and I also wonder how you scale a company making specialist hardware using dinner-plate-sized chips.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
Introducing Cerebras Inference
‣ Llama3.1-70B at 450 tokens/s – 20x faster than GPUs
‣ 60c per M tokens – a fifth the price of hyperscalers
‣ Full 16-bit precision for full model accuracy
‣ Generous rate limits for devs
Try now: Cerebras Inference

2/11
Cerebras Inference is the fastest Llama3.1 inference API by far: 1,800 tokens/s for 8B and 450tokens/s for 70B. We are ~20x faster than NVIDA GPUs and ~2x faster than Groq.

3/11
Going from 90 tokens/s to 1,800 tokens/s is like going from dialup to broadband. It makes AI instant:

4/11
Cerebras Inference is just 10c per million tokens for 8B and 60c per million tokens for 70B. Our price-performance is so strong, we practically broke the chart on Artificial Analysis.

5/11
Inference today runs at tens of tokens per second instead of thousands due to the memory bandwidth limitation of GPUs. Generating one token requires sending the entire model parameters from memory to compute. Doing this a thousand times a second requires >100TB/s, far greater than what’s possible with HBM.

6/11
Cerebras solves the memory bandwidth bottleneck by building the largest chip in the world and storing the entire model on-chip. A Wafer Scale Engine has 44GB of on-chip SRAM with 21 petabytes/s of memory bandwidth, which lets us run Llama3.1 at over 1,800 tokens/s.

7/11
Cerebras inference isn’t just super-fast, it has huge throughput. With ~200x more on-chip memory than small AI chips, we can support batch sizes from 1 to 100, making it highly cost efficient to deploy at scale.

8/11
Cerebras Inference does not trade accuracy for speed. All our Llama3.1 models use Meta’s original 16-bit weights, ensuring the highest accuracy.

9/11
Lastly, we didn’t just build a fast demo – we have capacity to serve hundreds of billions of tokens per day to developers and enterprises. We will be adding new models (eg. Llama3.1-405B) and ramp even greater capacity in the coming months.

10/11
Try Cerebras inference for yourself:
Try Chat: Cerebras Inference
Free API key: Cerebras Developer Platform
Read Blog: Introducing Cerebras Inference: AI at Instant Speed - Cerebras

11/11

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
'Cerebras claims its solution is 20x faster than Nvidia's Hopper chips at AI inference, at a fraction of the price. "The way you beat the 800lb gorilla is by bringing a vastly better product to market. We've taken meaningful customers from Nvidia."' Subscribe to read

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Have been following Groq but missed Cerebras. Anyone in my network using them for inference?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
Cerebras Inference has the industry’s best pricing for high-speed inference

- 10c per million tokens for Llama3.1- 8B
- 60c per million tokens for Llama3.1- 70B

Try it today: Cerebras Inference

2/2
in or out

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

1/5
Chatbot Arena update

!

The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes!

Highlights:
- New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall!
- New Gemini-1.5-Pro (0827) shows strong gains in coding, math over previous versions.
- The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.

Big Congrats @GoogleDeepMind Gemini team on the incredible launch!

More plots in the followup posts

**Note: to better reflect community interests, older models nearing deprecation will soon be removed from the default leaderboard view.

2/5
Overall Leaderboard with /search?q=#votes and CIs:

3/5
Coding Arena: new Gemini-1.5-Pro improves significantly over previous versions.

4/5
The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.

5/5
Win-rate heatmap:

Check out full leaderboard at http://lmarena.ai/?leaderboard!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/4
Open-source is the future.

I just tested it. It's really good and an improved version for generating code.

2/4
have you tried it?

3/4
not in coding right now but not that behind.

4/4
ohh no, typo mistake

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

1/12
AI NEWS: Salesforce just introduced two autonomous AI agents for sales.

Plus, more developments from Amazon, Boston Dynamics, Synthflow, AI21 Labs, NVIDIA, Mistral, D-ID, and Krea AI.

Here's everything going on in AI right now:

2/12
Salesforce just introduced two fully autonomous, AI-powered sales agents:

> Einstein SDR Agent can engage with inbound leads 24/7
> Einstein Sales Coach Agent that can offers real-time sales suggestions during calls

We're headed into a new era of sales.

3/12
Amazon CEO Andy Jassy shared an update on Q, the companies AI assistant for software development

This quote is all you need to know:

"We estimate this has saved us the equivalent of 4,500 developer-years of work (yes, that number is crazy but, real)"

4/12
Boston Dynamics posted a new video of its Atlas robot doing push-ups, showcasing advancements in dynamic movement control

Can't wait to see the Atlas vs. Figure vs. Optimus cage match

5/12
Synthflow introduced a 'Synthflow White Label,' a new white-label voice AI solution for agencies.

It allows users to customize the companies no-code AI voice platform to clients with full customizability.

6/12
AI21 Labs unveiled Jamba 1.5, a multilingual open AI model family.

It most impressively has a 256,000 context length, 2.5x faster long context in its size class, and permissive licensing for smaller orgs.

7/12
Nvidia and Mistral just released Mistral-NeMo-Minitron 8B, a small language model that can run on laptops and PCs

It outperforms Mistral-7B and Meta-LLama 3.1-8B on the Open LLM leaderboard

Small models are improving at an insane rate

8/12
D-iD launched AI Video Translate, instant AI-generated multilingual videos

The feature clones the speakers voice and changes their lip movements to match the translated words in seconds

9/12
Krea AI added Flux 1, the new advanced text-to-image AI model, to its platform.

The integration comes with multiple styles, image variations, varying aspect ratios, and 3-minute free generations for non-subscribed users.

10/12
We're hiring at The Rundown!

We’re looking for another writer to join our team to help us write, edit, and distribute content on AI.

If you're interested, apply with the link in this thread:

11/12
As always, I’ll be sending out a more in-depth rundown on all the AI news, and why it actually matters in ~5 hours in my newsletter.

Join 650,000+ readers for free and never miss a thing in AI ever again: The Rundown AI

12/12
That's it for today's news in the world of AI.

I share what's happening in AI every day, follow me @rowancheung for more.

If you found this helpful, support me with a like/retweet on the first tweet of this thread:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

1/1
The Future of AI is Smart Architectures, Not Bigger Models

LLM developers are reaching their limit and the returns are getting smaller:

• 10x more data
• 10x more parameters
• 10x more training time

Yet, the improvements are minimal. The jump from GPT-3.5 to GPT-4 was huge, but no other LLM has matched that level of progress since.

Surprising, right?

Here's the catch - we've probably used up most of the good data already. Is anyone sitting on a dataset 10x bigger and more different than the entire Web? Not likely at all.

You can't multiply the training time or parameters by 10.
The costs are shocking. We’ll need entirely new architectures beyond the transformer, and that's still years away.

So, what's next?

It won’t be just LLMs. We're moving toward complex AI systems – agents, search during inference, and more.

If this is the case, "GPT-5" (if it’s even called that) might not just be an LLM. It could be a system that works well with these complex setups, similar to how MOE was created to manage inference costs.

The future isn’t about bigger language models. It’s about smarter AI architectures. This is where most of the economic benefits will come from going forward.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

1/11
From @davidtsong on my team

Cost of 1M tokens has dropped from $180 to $0.75 in ~18 months (240X cheaper!)

2/11
I nice frame of reference is overlaying with the cost per bit over time for the Internet. Or the cost of compute during the rise of the microchip.

Very clearly a supercycle where marginal costs of a foundational resource are going to zero.

3/11
The commoditization of foundational models now has proof.

Today anyone can build a startup/business within a day - more important to now focus on real world problems.

Key is to have a differentiated yet defensible use case (not easily replicable) - long term.

4/11

5/11
In reality there’s a room to make it X10000 cheaper

6/11
You want your ventures to bet that this trend will continue for a long time to come

7/11
CA?

8/11
The chart would be better without GPT-4o mini. Sharing a name doesn’t make it a GPT 4 level model.

9/11
Artificial Intelligence is becoming abundant and affordable

10/11
yeehaw

11/11
@davidtsong

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

1/6
LLM Pruning and Distillation in Practice: The Minitron Approach

abs: [2408.11796] LLM Pruning and Distillation in Practice: The Minitron Approach
models:
nvidia/Mistral-NeMo-Minitron-8B-Base · Hugging Face
nvidia/Llama-3.1-Minitron-4B-Width-Base · Hugging Face
nvidia/Llama-3.1-Minitron-4B-Depth-Base · Hugging Face

Compressing Llama 3.1 8B and Mistral NeMo 12B to 4B and 8B, respectively, with teacher correction, weight pruning, and distillation (Minitron approach from NVIDIA).

2/6
This one is trending on @aimodelsfyi this morning

3/6
The paper presents a comprehensive report on compressing two large language models - Llama 3.1 8B and Mistral NeMo 12B - using pruning and distillation techniques. The goal is to create smaller models with comparable performance while reducing the cost of training.

1. The compressed models, Llama-3.1-Minitron-4B and MN-Minitron-8B, exhibit strong performance on various benchmarks compared to similarly-sized models.
2. The width-pruned Llama-3.1-Minitron-4B model outperforms the depth-pruned variant.
3. The MN-Minitron-8B model provides an average speedup of 1.2x over the teacher Mistral NeMo 12B model, while the Llama-3.1-Minitron-4B models provide 2.7x and 1.8x speedups for the depth and width-pruned variants, respectively.

full paper: LLM Pruning and Distillation in Practice: The Minitron Approach

4/6
AI Summary: The paper details the process of compressing Llama 3.1 8B and Mistral NeMo 12B models to 4B and 8B parameters through pruning and distillation techniques. It evaluates two pruning strategies: dep...
LLM Pruning and Distillation in Practice: The Minitron Approach

5/6
added here @NVIDIAAI : llm-course/nvidia-nim.md at main · andysingal/llm-course

6/6
LLM Pruning and Distillation in Practice: The Minitron Approach

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

Grok-2 gets a speed bump after developers rewrite code in three days

The two developers responsible are Lianmin Zheng and Saeed Maleki, according to Babuschkin's post, and they relied on open source SGLang.

venturebeat.com

Grok-2 gets a speed bump after developers rewrite code in three days

Carl Franzen @carlfranzen

August 23, 2024 2:05 PM

Blue and yellow robots race through a pink desert in an AI drawing style illustration

Credit: VentureBeat made with Midjourney

Elon Musk’s xAI has made waves in the last week with the release of its Grok-2 large language model (LLM) chatbot — available through an $8 USD monthly subscription on the social network X.

Now, both versions of Grok-2 — Grok-2 and Grok-2 mini, the latter designed to be less powerful but faster — have both increased the speed at which they can analyze information and output responses after two developers at xAI rewrite the inference code stack completely in the last three days.

As xAI developer Igor Babuschkin posted this afternoon on the social network X under his handle @ibab:

“Grok 2 mini is now 2x faster than it was yesterday. In the last three days @lm_zheng and @MalekiSaeed rewrote our inference stack from scratch using SGLang. This has also allowed us to serve the big Grok 2 model, which requires multi-host inference, at a reasonable speed. Both models didn’t just get faster, but also slightly more accurate. Stay tuned for further speed improvements!”

Grok 2 mini is now 2x faster than it was yesterday. In the last three days @lm_zheng and @MalekiSaeed rewrote our inference stack from scratch using SGLang (GitHub - sgl-project/sglang: SGLang is a fast serving framework for large language models and vision language models.). This has also allowed us to serve the big Grok 2 model, which requires multi-host inference, at a… pic.twitter.com/G9iXTV8o0z

— ibab (@ibab) August 23, 2024

The two developers responsible are Lianmin Zheng and Saeed Maleki, according to Babuschkin’s post.

To rewrite the inference for Grok-2, they relied on SGLang, an open-source (Apache 2.0 licensed) highly efficient system for executing complex language model programs, achieving up to 6.4 times higher throughput than existing systems.

SGLang was developed by researchers from Stanford University, the University of California, Berkeley, Texas A&M University and Shanghai Jiao Tong University and integrates a frontend language with a backend runtime to simplify the programming of language model applications.

The system is versatile, supporting many models, including Llama, Mistral, and LLaVA, and is compatible with open-weight and API-based models like OpenAI’s GPT-4. SGLang’s ability to optimize execution through automatic cache reuse and parallelism within a single program makes it a powerful tool for developers working with large-scale language models.

Grok-2 and Grok-2-Mini Performance Highlights

Additionally, in the latest update to thethird-party Lmsys Chatbot Arena leaderboard that rates AI model performance, the main Grok-2 has secured the #2 spot with an impressive Arena Score of 1293, based on 6686 votes.

This effectively puts Grok-2 in the number two spot (fittingly) for the most powerful AI models in the world, tied with Google’s Gemini-1.5 Pro model, and just behind OpenAI’s latest version of ChatGPT-4o.

Grok-2-mini, which has also benefited from the recent enhancements, has climbed to the #5 position, boasting an Arena Score of 1268 from 7266 votes, just behind GPT-4o mini and Claude 3.5 Sonnet.

Both models are proprietary to xAI, reflecting the company’s commitment to advancing AI technology.

Grok-2 has distinguished itself, particularly in mathematical tasks, where it ranks #1. The model also holds strong positions across various other categories, including Hard Prompts, Coding, and Instruction-following, where it consistently ranks near the top.

This performance places Grok-2 ahead of other prominent models like OpenAI’s GPT-4o (May 2024), which now ranks #4.

Future Developments

According to a response by Babuschkin on X, the main advantage of using Grok-2-mini over the full Grok-2 model is its enhanced speed.

Yes, that’s the main reason for now. We will make it even faster than it is right now.

— ibab (@ibab) August 23, 2024

However, Babuschkin pledged that xAI would further improve the processing speed of Grok-2-mini, which could make it an even more attractive option for users seeking high performance with lower computational overhead.

The addition of Grok-2 and Grok-2-mini to the Chatbot Arena leaderboard and their subsequent performance have garnered significant attention within the AI community.

The models’ success is a testament to xAI’s ongoing innovation and its commitment to pushing the boundaries of what AI can achieve.

As xAI continues to refine its models, the AI landscape can expect further enhancements in both speed and accuracy, keeping Grok-2 and Grok-2-mini at the forefront of AI development.

bnew · Aug 29, 2024

1/2
Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Paper: [2408.15242] Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty
Project: UC-GS
Code: GitHub - SainingZhang/UC-GS: [BMVC 2024 ] Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Method

1 | 2

2/2

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
In this work, we introduce a novel uncertainty-aware 3D-Gaussian Splatting training paradigm to effectively use aerial imagery to enhance the novel view synthesis of road views.
Training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and do
es not demonstrate remarkable improvements in performance on road views. To enhance the novel view synthesis of road views and to effectively use the aerial information, this work designs an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did.

Paper: Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty
Link: [2408.15242] Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty
Project: UC-GS

/search?q=#AI /search?q=#AI美女 /search?q=#LLMs /search?q=#deeplearning /search?q=#machinelearning /search?q=#3D /search?q=#GenerativeAI

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 29, 2024

1/1
Towards Realistic Example-based Modeling via 3D Gaussian Stitching

discuss: Paper page - Towards Realistic Example-based Modeling via 3D Gaussian Stitching

Using parts of existing models to rebuild new models, commonly termed as example-based modeling, is a classical methodology in the realm of computer graphics. Previous works mostly focus on shape composition, making them very hard to use for realistic composition of 3D objects captured from real-world scenes. This leads to combining multiple NeRFs into a single 3D scene to achieve seamless appearance blending. However, the current SeamlessNeRF method struggles to achieve interactive editing and harmonious stitching for real-world scenes due to its gradient-based strategy and grid-based representation. To this end, we present an example-based modeling method that combines multiple Gaussian fields in a point-based representation using sample-guided synthesis. Specifically, as for composition, we create a GUI to segment and transform multiple fields in real time, easily obtaining a semantically meaningful composition of models represented by 3D Gaussian Splatting (3DGS). For texture blending, due to the discrete and irregular nature of 3DGS, straightforwardly applying gradient propagation as SeamlssNeRF is not supported. Thus, a novel sampling-based cloning method is proposed to harmonize the blending while preserving the original rich texture and content. Our workflow consists of three steps: 1) real-time segmentation and transformation of a Gaussian model using a well-tailored GUI, 2) KNN analysis to identify boundary points in the intersecting area between the source and target models, and 3) two-phase optimization of the target model using sampling-based cloning and gradient constraints. Extensive experimental results validate that our approach significantly outperforms previous works in terms of realistic synthesis, demonstrating its practicality.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
Towards Realistic Example-based Modeling via 3D Gaussian Stitching

Paper: [2408.15708] Towards Realistic Example-based Modeling via 3D Gaussian Stitching
Project: Towards Realistic Example-based Modeling via 3D Gaussian Stitching

Workflow Demo

2/2

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Large Language Models News & Discussions

Veteran

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model​

Agent-based AI systems based on Strawberry​

Veteran

​

German AI startup Aleph Alpha unveils new AI stack "Pharia AI" and new language models​

​

New Pharia-1-LLM language models published​

Veteran

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B​

Submission history​

Veteran

New prompting method can help improve LLM reasoning skills​

​

Recognizing and filtering irrelevant information - and only then responding​

Veteran

OpenAI's latest fine-tuning update allows GPT-4o to learn your business inside and out​

Free training tokens available through September 23​

Fine-tuning can help, but is not a cure-all​

Veteran

Microsoft releases new Phi 3.5 open-source language and vision models​

​

Phi's context window gets an upgrade​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Grok-2 gets a speed bump after developers rewrite code in three days​

Grok-2 and Grok-2-Mini Performance Highlights​

Future Developments​

Veteran

Veteran

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model

Agent-based AI systems based on Strawberry

German AI startup Aleph Alpha unveils new AI stack "Pharia AI" and new language models

New Pharia-1-LLM language models published

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Submission history

New prompting method can help improve LLM reasoning skills

Recognizing and filtering irrelevant information - and only then responding

OpenAI's latest fine-tuning update allows GPT-4o to learn your business inside and out

Free training tokens available through September 23

Fine-tuning can help, but is not a cure-all

Microsoft releases new Phi 3.5 open-source language and vision models

Phi's context window gets an upgrade

Grok-2 gets a speed bump after developers rewrite code in three days

Grok-2 and Grok-2-Mini Performance Highlights

Future Developments