Large Language Models News & Discussions

bnew · Jul 16, 2024

1/12

New @GoogleDeepMind paper

We trained Foundational Large Autorater Models (FLAMe) on extensive human evaluations, achieving the best RewardBench perf. among generative models trained solely on permissive data, surpassing both GPT-4 & 4o.

: [2407.10817] Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation

:

2/12

Human evaluations often lack standardization & adequate documentation, limiting their reusability. To address this, we curated FLAMe, a diverse collection of standardized human evaluations under permissive licenses, incl. 100+ quality assessment tasks & 5M+ human judgments.

3/12

Our collection covers diverse task types, from assessing summary quality to evaluating how well models follow instructions. It focuses on key evaluation pillars: general response quality, instruction-following, factuality, mathematical reasoning, coding, & safety.

4/12

Training instruction-tuned LLMs on our FLAMe collection significantly improves generalization to a wide variety of held-out tasks. Overall, our FLAMe model variants outperform popular proprietary LLM-as-a-Judge models like GPT-4 on 8 out of 12 autorater evaluation benchmarks.

5/12

FLAMe variants are among the most powerful generative models on RewardBench. Notably, FLAMe-RM-24B achieves an overall score of 87.8%, the best performance among generative models trained only on permissively licensed data, surpassing both GPT-4-0125 (85.9) & GPT-4o (84.7).

6/12

FLAMe variants outperform LLM-as-a-Judge models in most of the common use cases on LLM-AggreFact (factuality/attribution evaluation). FLAMe-24B achieves the highest overall performance of 81.1, while the next-best model GPT-4-0125 scores 80.6.

7/12

Finally, our analysis reveals that FLAMe is significantly less biased than popular LLM-as-a-Judge models like GPT-4 on the CoBBLEr autorater bias benchmark and adept at identifying high-quality responses for code generation.

8/12

w/ wonderful collaborators @kalpeshk2011 (co-lead, equal contribution), Sal, @ctar, @manaalfar, & @yunhsuansung.

We hope FLAMe will spur more fundamental research into reusable human evaluations, & the development of effective & efficient LLM autoraters.

9/12
great work. congrats

10/12
Thanks, Zhiyang!

11/12
Great work! I also explored the autorater for code generation a year ago

[2304.14317] ICE-Score: Instructing Large Language Models to Evaluate Code

12/12
Cool work, thanks for sharing!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

[2407.10817] Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
[Submitted on 15 Jul 2024]

Foundational Autoraters - Taming Large Language Models for Better Automatic Evaluation

Tu Vu, Kalpesh Krishna, Salaheddin Alzubi, Chris Tar, Manaal Faruqui, Yun-Hsuan Sung

Abstract:
As large language models (LLMs) advance, it becomes more challenging to reliably evaluate their output due to the high costs of human evaluation. To make progress towards better LLM autoraters, we introduce FLAMe, a family of Foundational Large Autorater Models. FLAMe is trained on our large and diverse collection of 100+ quality assessment tasks comprising 5M+ human judgments, curated and standardized using publicly released human evaluations from previous research. FLAMe significantly improves generalization to a wide variety of held-out tasks, outperforming LLMs trained on proprietary data like GPT-4 and Claude-3 on many tasks. We show that FLAMe can also serve as a powerful starting point for further downstream fine-tuning, using reward modeling evaluation as a case study (FLAMe-RM). Notably, on RewardBench, our FLAMe-RM-24B model (with an accuracy of 87.8%) is the top-performing generative model trained exclusively on permissively licensed data, outperforming both GPT-4-0125 (85.9%) and GPT-4o (84.7%). Additionally, we explore a more computationally efficient approach using a novel tail-patch fine-tuning strategy to optimize our FLAMe multitask mixture for reward modeling evaluation (FLAMe-Opt-RM), offering competitive RewardBench performance while requiring approximately 25x less training datapoints. Overall, our FLAMe variants outperform all popular proprietary LLM-as-a-Judge models we consider across 8 out of 12 autorater evaluation benchmarks, encompassing 53 quality assessment tasks, including RewardBench and LLM-AggreFact. Finally, our analysis reveals that FLAMe is significantly less biased than these LLM-as-a-Judge models on the CoBBLEr autorater bias benchmark, while effectively identifying high-quality responses for code generation.

Comments:	31 pages, 5 figures, 7 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2407.10817
	arXiv:2407.10817v1

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2407.10817

bnew · Jul 16, 2024

Microsoft’s new AI system ‘SpreadsheetLLM’ unlocks insights from spreadsheets, boosting enterprise productivity

Microsoft's SpreadsheetLLM AI system enhances spreadsheet data analysis, improving enterprise efficiency and enabling AI-powered insights.

venturebeat.com

Microsoft’s new AI system ‘SpreadsheetLLM’ unlocks insights from spreadsheets, boosting enterprise productivity

Michael Nuñez @MichaelFNunez

July 15, 2024 12:56 PM

Credit: VentureBeat made with Midjourney

Microsoft researchers have unveiled “ SpreadsheetLLM,” a new AI model designed to understand and work with spreadsheets, in a significant development for the world of enterprise AI.

The research paper, titled “SpreadsheetLLM: Encoding Spreadsheets for Large Language Models,” was recently published on arXiv and addresses the challenges of applying AI to the widely used but complex spreadsheet format.

SpreadsheetLLM combines the power of large language models (LLMs) with the structured data found in spreadsheets. “SpreadsheetLLM is an approach for encoding spreadsheet contents into a format that can be used with large language models (LLMs) and allows these models to reason over spreadsheet contents,” the researchers note, highlighting the critical need for improved AI tools in this area.

Microsoft’s SpreadsheetLLM system uses an innovative pipeline to compress and encode spreadsheets, enabling large language models (LLMs) to efficiently understand and analyze complex spreadsheet data. The SheetCompressor module plays a key role in optimizing the AI’s performance on spreadsheet tasks while achieving state-of-the-art accuracy. (credit: arxiv.org)

The researchers emphasize the ubiquity and importance of spreadsheets in the business world, noting that they are used for a wide range of tasks, from simple data entry and analysis to complex financial modeling and decision-making. However, they point out that “existing language models struggle to understand and reason over spreadsheet contents due to the structured nature of the data and the presence of formulas and references.”

SpreadsheetLLM bridges this gap by encoding spreadsheet data in a way that LLMs can understand and work with. The model uses a novel encoding scheme that preserves the structure and relationships within the spreadsheet while making it accessible to language models.

Unlocking AI-assisted data analysis and decision-making

The potential applications of SpreadsheetLLM are vast, ranging from automating routine data analysis tasks to providing intelligent insights and recommendations based on spreadsheet data. By enabling LLMs to reason over spreadsheet contents, answer questions about the data, and even generate new spreadsheets based on natural language prompts, SpreadsheetLLM opens up exciting possibilities for AI-assisted data analysis and decision-making in the enterprise.

One of the key advantages of SpreadsheetLLM is its ability to make spreadsheet data more accessible and understandable to a wider range of users. With the power of natural language processing, users could potentially query and manipulate spreadsheet data using plain English, rather than complex formulas or programming languages. This could democratize access to data insights and empower more individuals within an organization to make data-driven decisions.

Furthermore, SpreadsheetLLM could help automate many of the tedious and time-consuming tasks associated with spreadsheet data analysis, such as data cleaning, formatting, and aggregation. By leveraging the power of AI, businesses could potentially save countless hours and resources, allowing employees to focus on higher-value activities that require human judgment and creativity.

Microsoft’s growing investment in enterprise AI

Microsoft’s SpreadsheetLLM comes at a time when the company is heavily investing in AI technologies for the enterprise. Microsoft introduced Microsoft 365 Copilot, an AI-powered assistant designed to help users with various productivity tasks, in March of last year. The company also announced the public preview of Copilot for Finance, an AI chatbot specifically tailored for finance professionals, in February.

These developments demonstrate Microsoft’s commitment to bringing the power of AI to the enterprise and transforming how we work with data. As businesses increasingly rely on data to drive decision-making and gain a competitive edge, tools like SpreadsheetLLM and Copilot could become essential for staying ahead of the curve.

However, the rise of AI in the enterprise also raises important questions about the future of work and the potential impact on jobs. As AI continues to advance and automate more tasks, companies will need to carefully consider how to retrain and upskill their workforce to adapt to these changes. There will also be a need for ongoing dialogue and collaboration between technology companies, policymakers, and society at large to ensure that the benefits of AI are distributed fairly and that any negative impacts are mitigated.

Despite these challenges, the potential benefits of AI in the enterprise are too significant to ignore. By enabling more efficient, accurate, and insightful data analysis, tools like SpreadsheetLLM could help businesses unlock new opportunities, drive innovation, and ultimately create more value for their customers and stakeholders.

As SpreadsheetLLM moves from research to real-world applications, it will be exciting to see how it transforms the way we work with spreadsheets and unlocks new possibilities for data-driven decision-making in the enterprise. With Microsoft at the forefront of this AI-driven transformation, the future of work, particularly around Excel and spreadsheets, looks brighter than ever.

bnew · Jul 16, 2024

Mistral releases Codestral Mamba for faster, longer code generation

Codestral Mamba, a new coding model from Mistral, outperformed other coding assistants. Mistral also released a math solving model.

venturebeat.com

Mistral releases Codestral Mamba for faster, longer code generation

Emilia David

July 16, 2024 1:55 PM

Line art of black snake over green circuit board

Credit: VentureBeat made with Midjourney V6

The well-funded French AI startup Mistral, known for its powerful open source AI models, launched two new entries in its growing family of large language models (LLMs) today: a math-based model and a code generating model for programmers and developers based on the new architecture known as Mamba developed by other researchers late last year.

Mamba seeks to improve upon the efficiency of the transformer architecture used by most leading LLMs by simplifying its attention mechanisms. Mamba-based models, unlike more common transformer-based ones, could have faster inference times and longer context. Other companies and developers including AI21 have released new AI models based on it.

Now, using this new architecture, Mistral’s aptly named Codestral Mamba 7B offers a fast response time even with longer input texts. Codestral Mamba works well for code productivity use cases, especially for more local coding projects.

Mistral tested the model, which will be free to use on Mistral’s l a Plateforme API, handling inputs of up to 256,000 tokens — double that of OpenAI’s GPT-4o.

In benchmarking tests, Mistral showed that Codestral Mamba did better than rival open source models CodeLlama 7B, CodeGemma-1.17B, and DeepSeek in HumanEval tests.

Chart from Mistral detailing Codestral Mamba's performance.

Developers can modify and deploy Codestral Mamba from its GitHub repository and through HuggingFace. It will be available with an open source Apache 2.0 license.

Mistral claimed the earlier version of Codestral outperformed other code generators like CodeLlama 70B and DeepSeek Coder 33B.

Code generation and coding assistants have become widely used applications for AI models, with platforms like GitHub’s Copilot, powered by OpenAI, Amazon’s CodeWhisperer, and Codenium gaining popularity.

Mathstral is suited for STEM use cases

Mistral’s second model launch is Mathstral 7B, an AI model designed specifically for math-related reasoning and scientific discovery. Mistral developed Mathstral with Project Numina.

Mathstral has a 32K context window and will be under an Apache 2.0 open source license. Mistral said the model outperformed every model designed for math reasoning. It can achieve “significantly better results” on benchmarks with more inference-time computations. Users can use it as is or fine-tune the model.

Chart from Mistral showing Mathstral evaluations.

“Mathstral is another example of the excellent performance/speed tradeoffs achieved when building models for specific purposes – a development philosophy we actively promote in la Plateforme, particularly with its new fine-tuning capabilities,” Mistral said in a blog post.

Mathstral can be accessed through Mistral’s la Plataforme and HuggingFace.

Mistral, which tends to offer its models on an open-source system, has been steadily competing against other AI developers like OpenAI and Anthropic.

It recently raised $640 million in series B funding, bringing its valuation close to $6 billion. The company also received investments from tech giants like Microsoft and IBM.

bnew · Jul 17, 2024

https://www.axios.com/2024/07/17/meta-future-multimodal-ai-models-eu

3 hours ago - Technology

Scoop: Meta won't offer future multimodal AI models in EU

Illustration of waving hand emojis in the shape of the European Union flag

Illustration: Sarah Grillo/Axios

Meta will withhold its next multimodal AI model — and future ones — from customers in the European Union because of what it says is a lack of clarity from regulators there, Axios has learned.

Why it matters: The move sets up a showdown between Meta and EU regulators and highlights a growing willingness among U.S. tech giants to withhold products from European customers.

State of play: "We will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment," Meta said in a statement to Axios.

Apple similarly said last month that it won't release its Apple Intelligence features in Europe because of regulatory concerns.
The Irish Data Protection Commission, Meta's lead privacy regulator in Europe, did not immediately respond to a request for comment.

Driving the news: Meta plans to incorporate the new multimodal models, which are able to reason across video, audio, images and text, in a wide range of products, including smartphones and its Meta Ray-Ban smart glasses.

Meta says its decision also means that European companies will not be able to use the multimodal models even though they are being released under an open license.
It could also prevent companies outside of the EU from offering products and services in Europe that make use of the new multimodal models.
The company is also planning to release a larger, text-only version of its Llama 3 model soon. That will be made available for customers and companies in the EU, Meta said.

Between the lines: Meta's issue isn't with the still-being-finalized AI Act, but rather with how it can train models using data from European customers while complying with GDPR — the EU's existing data protection law.

Meta announced in May that it planned to use publicly available posts from Facebook and Instagram users to train future models. Meta said it sent more than 2 billion notifications to users in the EU, offering a means for opting out, with training set to begin in June.
Meta says it briefed EU regulators months in advance of that public announcement and received only minimal feedback, which it says it addressed.

In June — after announcing its plans publicly — Meta was ordered to pause the training on EU data. A couple weeks later it received dozens of questions from data privacy regulators from across the region.

The intrigue: The United Kingdom has a nearly identical law to GDPR, but Meta says it isn't seeing the same level of regulatory uncertainty and plans to launch its new model for U.K. users.

A Meta representative told Axios that European regulators are taking much longer to interpret existing law than their counterparts in other regions.

The big picture: Meta's move highlights a growing conflict between the U.S.-based tech giants and European regulators.

Tensions are not new, as the EU has long been seen as far tighter in its regulation of both privacy and antitrust matters.
Tech companies, meanwhile, argues that those regulations hurt both consumers and the competitiveness of European companies.

What they're saying: A Meta representative told Axios that training on European data is key to ensuring its products properly reflect the terminology and culture of the region.

Meta has said that competitors such as Google and OpenAI are already training on European data.

bnew · Jul 17, 2024

OpenAI used a game to help AI models explain themselves better

OpenAI's work seeks to give people a framework to train models to better explain how they arrived at particular answers.

venturebeat.com

OpenAI used a game to help AI models explain themselves better

Carl Franzen @carlfranzen

July 17, 2024 10:34 AM

One of the most interesting and useful slang terms to emerge from Reddit in my opinion is ELI5, from its subreddit of the same name, which stands for “Explain It Like I’m 5” years old. The idea is that by asking an expert for an explanation simple enough for a five-year-old child to understand, a human expert can convey complex ideas, theories, and concepts in a way that is easier for everyone, even uneducated laypeople, to understand.

As it turns out, the concept may be helpful for AI models too, especially when peering into the “black box” of how they arrive at answers, also known as the “legibility” problem.

Today, OpenAI researchers are releasing a new scientific paper on the company’s website and on arXiv.org (embedded below) revealing a new algorithm they’ve developed by which large language models (LLMs) such as OpenAI’s GPT-4 (which powers some versions of ChatGPT) can learn to better explain themselves to their users. The paper is titled “Prover-Verifier Games Improve Legibility of LLM Outputs.”

This is critical for establishing trustworthiness in AI systems especially as they become more powerful and integrated into fields where incorrectness is dangerous or a matter of life-or-death, such as healthcare, law, energy, military and defense applications, and other critical infrastructure.

Even for other businesses not dealing regularly with sensitive or dangerous materials, the lack of trustworthiness around AI models’ answers and their propensity to hallucinate incorrect answers may stop them from embracing models that could otherwise benefit and level-up their operations. OpenAI’s work seeks to give people a framework to train models to better explain how they arrived at particular answers so that they can be better trusted.

“This is fresh research that we just wrapped up,” said OpenAI researcher Jan Hendrik Kirchner, a co-author of the paper, in a teleconference interview with VentureBeat yesterday. “We’re very excited about where to take it from here, but it’s important for us to share these insights with the community as fast as possible, so that people learn about the legibility problem and can contribute to the solution.”

OOAI-paper-–-7.17.24 Download

The Prover-Verifier Game and how it works

The new algorithm from the OpenAI researchers is based on the “Prover-Verifier Game” first conceived and articulated in another paper by machine learning researchers at the University of Toronto and Vector Institute for Artificial Intelligence published in 2021.

The game pairs two AI models together — a more powerful and intelligent “prover” and a less powerful “verifier” and asks them to essentially outwit one another.

The prover’s goal is to always get the verifier to believe in a certain answer regardless of whether or not it is the correct one, while the verifier’s goal is to always select the correct answer no matter what the prover may say or try to persuade otherwise.

The goal is to get AI models to “show their work” more when providing answers to human users, or as the University of Toronto researchers put it in their paper, “encourage neural networks to solve decision problems in a verifiable manner.”

“As our large language models are increasingly used in many critical and complex domains, it is very important for the solutions to be trustworthy, and so the Prover-Verifier game method is one way to make the output more clear and verifiable, and that’s one component of trustworthiness,” said OpenAI researcher and paper co-author Yining Chen.

How OpenAI implemented the Prover-Verifier Game to make AI better at explaining itself

In the case of OpenAI, the researchers used two custom fine-tuned GPT-4 family models in this way and had them engage in several rounds of the game wherein they were asked to answer grade school math word problems with known answers.

However, the OpenAI researchers altered the game subtly between rounds in an important way: the prover model was set up to be either “helpful” and trying to deliver the right answer to the verifier, or “sneaky” and convince the verifier of the prover’s view whether or not it was correct. The less intelligent verifier model didn’t know whether or not it was interacting with a helpful or sneaker prover, and as such, had to just evaluate the prover’s answers with regards only to its own training on whether the prover’s response was correct or incorrect.

Also, the OpenAI researchers retrained both models between each round of the game with data on how they performed in the previous rounds, ultimately increasing their skill at determining whether or not an answer was correct as well as explaining, in the case of the prover model, why it chose the answer it did in a human legible way.

Human contractors and evaluators were used to rate how understandable or legible the answers of the prover model were as well.

After doing this process across at least four or more rounds, the researchers found that the verifier model became better at resisting the persuasion techniques of the sneaky prover model, while the prover model also became better at explaining itself to human users.

Screenshot-2024-07-17-at-1.30.40%E2%80%AFPM.png

As the paper states: “sneaky solutions that are harder to detect for smaller verifiers are also harder to detect for humans, and helpful solutions that are more legible to smaller verifiers are also more legible to humans.”

The ultimate resulting algorithm developed by the researchers from these rounds optimizes LLMs for both correctness and legibility to human evaluators (seen as the top middle line in the graph below labeled “checkability game”):

Screenshot-2024-07-17-at-1.17.38%E2%80%AFPM.png

OpenAI states in its blog post that it hopes the work “will be instrumental in developing AI systems whose outputs are not only correct but also transparently verifiable, thereby enhancing trust and safety in their real-world applications.”

The method “has potential to align future models that are more intelligent than humans,” Chen added to VentureBeat.

“It might be very tricky at some point for humans to reliably evaluate whether that completion is correct or not,” when models exceed human intelligence, said Kirchner.

bnew · Jul 17, 2024

Google brings AI agent platform Project Oscar open source

Google developers for the open source language Go has started using Project Oscar to build AI agents to manage its issue tracking.

venturebeat.com

Google brings AI agent platform Project Oscar open source

Emilia David

July 16, 2024 10:40 PM

Image credit: Venturebeat with DALL-E 3

Google has announced Project Oscar, a way for open-source development teams to use and build agents to manage software programs.

Project Oscar, announced during Google I/O Bengaluru, is an open-source platform that can help software product teams monitor issues or bugs. Right now, Oscar is geared toward open-source projects, but it may also be released to manage closed-source projects in the future.

“I truly believe that AI has the potential to transform the entire software development lifecycle in many positive ways,” Karthik Padmanabhan, lead Developer Relations at Google India, said in a blog post. “[We’re] sharing a sneak peek into AI agents we’re working on as part of our quest to make AI even more helpful and accessible to all developers.”

Through Project Oscar, developers can create AI agents that function throughout the software development lifecycle. These agents can range from a developer agent to a planning agent, runtime agent, or support agent. The agents can interact through natural language, so users can give instructions to them without needing to redo any code.

Cameron Balahan, group product manager for Google’s open-source programming language Go, said Oscar is deployed on the project now to help the Go development team keep track of bug reports and other contributor engagements.

Balahan said the Go project has over 93,000 commits and 2,000 contributors, making it extremely difficult to keep track of all the issues that could arise.

“We wondered if AI agents could help, not by writing code which we truly enjoy, but by reducing disruptions and toil,” Balahan said in a video released by Google.

Go uses an AI agent developed through Project Oscar that takes issue reports and “enriches issue reports by reviewing this data or invoking development tools to surface the information that matters most.” The agent also interacts with whoever reports an issue to clarify anything, even if human maintainers are not online.

Balahan said Project Oscar will soon be deployed to other open-source projects from Google.

“Our vision is that anyone can deploy Oscar to their project, open or closed source, and use the agents that come pre-packaged or bring their own,” he said.

VentureBeat reported that AI agents have started to change software development. Coding assistants, a fast-growing sector that includes GitHub Copilot and Amazon’s CodeWhisperer, have been found to increase developer productivity. Other AI assistants, like Amazon’s Q, help users query their internal data or collaborate with other teams.

bnew · Jul 21, 2024

1/1
Congrats
@openai on the new GPT-4o mini release!

GPT-4o mini's early version "upcoming-gpt-mini" was tested in Arena in the past week.

With over 6K user votes, we are excited to share its early score reaching GPT-4-Turbo performance, while offering significant cost reduction (15/60 cent per million input/output tokens).

Its official version "gpt-4o-mini" is now in the Arena. We're re-collecting votes & will update its result to leaderboard soon!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 23, 2024

Micky Mikey said:
The current generation of models are stagnating. Once ChatGPT5 or 6 (with reasoning capabilities) comes out the hype will start all over again.

Claude sonnet 3.5 was recently released and has surpassed chatgpt in some cases.

Llama 3.1 405B just got released today!

1/1
We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.

We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
LLaMa 3.1 benchmarks side by side.

This is truly a SOTA model.

Beats GPT4 almost on every single benchmark.

Continuously trained with a 128K context length.

Pre-trained on 15.6T tokens (405B).

The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.

Most SFT examples were using synthetic data.

Trained on 16K H100 GPUs.

License allows output of the model to train other models

It seems like a future version integrated image, video, and speech capabilities using a compositional approach (not released yet).

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
With the introduction of Llama 3.1 405B, we now have an open-source model that beats the best closed-source one available today on selected benchmarks.

What a time.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
This might be the biggest moment for Open-Source AI.

Meta just released Llama 3.1 and a 405 billion parameter model, the most sophisticated open model ever released.

It already outperforms GPT-4o on several benchmarks.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Compared leaked Llama 3.1 benchmarks with other leading models, very excited for the release!

We can tier out models by price / 1M output tokens.

O($0.10): 4o-mini and <10B param models. I think 4o-mini will still be best but a strong local 8B will unlock lots of applications.
O($1): 70B class models, Haiku, 3.5-turbo. Distilled 70B looks like a category winner here! This is a nice price-point for lots of AI apps.
O($10): 405B, 4o, 3.5 Sonnet. Have to see how the post-training and harder benches go. Guess 3.5 sonnet is still the best, but 405B might give 4o real competition. This is just vibes, I like Sonnet's RLHF and hate the GPT RLHF.

Other takeaways:
- most benchmarks are saturated and probably not informative. OpenAI only reports harder benchmarks now, other developers should too (eg MATH > GSM8K)
- 405B doesn't look much better than distilled 70B, but harder benches and vibe tests will be better measurements than these tests
- 8B/70B distilled models are substantially better than when trained from scratch. I've wondered if for a given compute budget, it is better to overtrain a X param model or to train a X' (where X' >> X) and distill to X, maybe we will find out
- a lot of people thought that the original 8B saturated the params after 15T tokens. this is good evidence that it did not. softmax with high token count may have been why it did not quantize well. curious if the Llama 4 series will train in FP8 or BF16 -- logistically, serving 400B on 1x8H100 node seems much easier than 2x8H100 and it's much simpler to do this if the model was pretrained quantization-aware
- Gemma models do surprisingly well on MMLU relative to their other scores. most of the innovation in Gemma was supposed to be post-training, so curious if the 27B will hold up vs new 8B-Instruct
- Mistral Nemo 12B and 3.1 8B look somewhat similar, but I'd guess most developers will stick to Llama's better tooling and smaller param count. tough timing
- I am fairly sure that 3.1 was not trained early fusion, and somebody's going to throw up a Llava finetune in 2-3 days.
- personal guess (using other info) is that 405B-Instruct will fall short of Sonnet / 4o. but man, what a good model to have open source, and the gap is closing
- llama3.1405 looks like a Pi joke

all models are base except 4o-class, took the best available score from different repos and averaged MMLU for Llama. all benchmarks are wrong but hopefully useful for an overall picture.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 24, 2024

Open Source AI is the Path Forward

Mark Zuckerberg outlines why he believes open source AI is good for developers, Meta and the world.

about.fb.com

Open Source AI Is the Path Forward

July 23, 2024

By Mark Zuckerberg, Founder and CEO

In the early days of high-performance computing, the major tech companies of the day each invested heavily in developing their own closed source versions of Unix. It was hard to imagine at the time that any other approach could develop such advanced software. Eventually though, open source Linux gained popularity – initially because it allowed developers to modify its code however they wanted and was more affordable, and over time because it became more advanced, more secure, and had a broader ecosystem supporting more capabilities than any closed Unix. Today, Linux is the industry standard foundation for both cloud computing and the operating systems that run most mobile devices – and we all benefit from superior products because of it.

I believe that AI will develop in a similar way. Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas. Starting next year, we expect future Llama models to become the most advanced in the industry. But even before that, Llama is already leading on openness, modifiability, and cost efficiency.

Today we’re taking the next steps towards open source AI becoming the industry standard. We’re releasing Llama 3.1 405B, the first frontier-level open source AI model, as well as new and improved Llama 3.1 70B and 8B models. In addition to having significantly better cost/performance relative to closed models, the fact that the 405B model is open will make it the best choice for fine-tuning and distilling smaller models.

Beyond releasing these models, we’re working with a range of companies to grow the broader ecosystem. Amazon, Databricks, and NVIDIA are launching full suites of services to support developers fine-tuning and distilling their own models. Innovators like Groq have built low-latency, low-cost inference serving for all the new models. The models will be available on all major clouds including AWS, Azure, Google, Oracle, and more. Companies like Scale.AI, Dell, Deloitte, and others are ready to help enterprises adopt Llama and train custom models with their own data. As the community grows and more companies develop new services, we can collectively make Llama the industry standard and bring the benefits of AI to everyone.

Meta is committed to open source AI. I’ll outline why I believe open source is the best development stack for you, why open sourcing Llama is good for Meta, and why open source AI is good for the world and therefore a platform that will be around for the long term.

Why Open Source AI Is Good for Developers

When I talk to developers, CEOs, and government officials across the world, I usually hear several themes:

We need to train, fine-tune, and distill our own models. Every organization has different needs that are best met with models of different sizes that are trained or fine-tuned with their specific data. On-device tasks and classification tasks require small models, while more complicated tasks require larger models. Now you’ll be able to take the most advanced Llama models, continue training them with your own data and then distill them down to a model of your optimal size – without us or anyone else seeing your data.
We need to control our own destiny and not get locked into a closed vendor. Many organizations don’t want to depend on models they cannot run and control themselves. They don’t want closed model providers to be able to change their model, alter their terms of use, or even stop serving them entirely. They also don’t want to get locked into a single cloud that has exclusive rights to a model. Open source enables a broad ecosystem of companies with compatible toolchains that you can move between easily.
We need to protect our data. Many organizations handle sensitive data that they need to secure and can’t send to closed models over cloud APIs. Other organizations simply don’t trust the closed model providers with their data. Open source addresses these issues by enabling you to run the models wherever you want. It is well-accepted that open source software tends to be more secure because it is developed more transparently.
We need a model that is efficient and affordable to run. Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o, for both user-facing and offline inference tasks.
We want to invest in the ecosystem that’s going to be the standard for the long term. Lots of people see that open source is advancing at a faster rate than closed models, and they want to build their systems on the architecture that will give them the greatest advantage long term.

Why Open Source AI Is Good for Meta

Meta’s business model is about building the best experiences and services for people. To do this, we must ensure that we always have access to the best technology, and that we’re not locking into a competitor’s closed ecosystem where they can restrict what we build.

One of my formative experiences has been building our services constrained by what Apple will let us build on their platforms. Between the way they tax developers, the arbitrary rules they apply, and all the product innovations they block from shipping, it’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build. On a philosophical level, this is a major reason why I believe so strongly in building open ecosystems in AI and AR/VR for the next generation of computing.

People often ask if I’m worried about giving up a technical advantage by open sourcing Llama, but I think this misses the big picture for a few reasons:

First, to ensure that we have access to the best technology and aren’t locked into a closed ecosystem over the long term, Llama needs to develop into a full ecosystem of tools, efficiency improvements, silicon optimizations, and other integrations. If we were the only company using Llama, this ecosystem wouldn’t develop and we’d fare no better than the closed variants of Unix.

Second, I expect AI development will continue to be very competitive, which means that open sourcing any given model isn’t giving away a massive advantage over the next best models at that point in time. The path for Llama to become the industry standard is by being consistently competitive, efficient, and open generation after generation.

Third, a key difference between Meta and closed model providers is that selling access to AI models isn’t our business model. That means openly releasing Llama doesn’t undercut our revenue, sustainability, or ability to invest in research like it does for closed providers. (This is one reason several closed providers consistently lobby governments against open source.)

Finally, Meta has a long history of open source projects and successes. We’ve saved billions of dollars by releasing our server, network, and data center designs with Open Compute Project and having supply chains standardize on our designs. We benefited from the ecosystem’s innovations by open sourcing leading tools like PyTorch, React, and many more tools. This approach has consistently worked for us when we stick with it over the long term.

bnew · Jul 24, 2024

Why Open Source AI Is Good for the World

I believe that open source is necessary for a positive AI future. AI has more potential than any other modern technology to increase human productivity, creativity, and quality of life – and to accelerate economic growth while unlocking progress in medical and scientific research. Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn’t concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society.

There is an ongoing debate about the safety of open source AI models, and my view is that open source AI will be safer than the alternatives. I think governments will conclude it’s in their interest to support open source because it will make the world more prosperous and safer.

My framework for understanding safety is that we need to protect against two categories of harm: unintentional and intentional. Unintentional harm is when an AI system may cause harm even when it was not the intent of those running it to do so. For example, modern AI models may inadvertently give bad health advice. Or, in more futuristic scenarios, some worry that models may unintentionally self-replicate or hyper-optimize goals to the detriment of humanity. Intentional harm is when a bad actor uses an AI model with the goal of causing harm.

It’s worth noting that unintentional harm covers the majority of concerns people have around AI – ranging from what influence AI systems will have on the billions of people who will use them to most of the truly catastrophic science fiction scenarios for humanity. On this front, open source should be significantly safer since the systems are more transparent and can be widely scrutinized. Historically, open source software has been more secure for this reason. Similarly, using Llama with its safety systems like Llama Guard will likely be safer and more secure than closed models. For this reason, most conversations around open source AI safety focus on intentional harm.

Our safety process includes rigorous testing and red-teaming to assess whether our models are capable of meaningful harm, with the goal of mitigating risks before release. Since the models are open, anyone is capable of testing for themselves as well. We must keep in mind that these models are trained by information that’s already on the internet, so the starting point when considering harm should be whether a model can facilitate more harm than information that can quickly be retrieved from Google or other search results.

When reasoning about intentional harm, it’s helpful to distinguish between what individual or small scale actors may be able to do as opposed to what large scale actors like nation states with vast resources may be able to do.

At some point in the future, individual bad actors may be able to use the intelligence of AI models to fabricate entirely new harms from the information available on the internet. At this point, the balance of power will be critical to AI safety. I think it will be better to live in a world where AI is widely deployed so that larger actors can check the power of smaller bad actors. This is how we’ve managed security on our social networks – our more robust AI systems identify and stop threats from less sophisticated actors who often use smaller scale AI systems. More broadly, larger institutions deploying AI at scale will promote security and stability across society. As long as everyone has access to similar generations of models – which open source promotes – then governments and institutions with more compute resources will be able to check bad actors with less compute.

The next question is how the US and democratic nations should handle the threat of states with massive resources like China. The United States’ advantage is decentralized and open innovation. Some people argue that we must close our models to prevent China from gaining access to them, but my view is that this will not work and will only disadvantage the US and its allies. Our adversaries are great at espionage, stealing models that fit on a thumb drive is relatively easy, and most tech companies are far from operating in a way that would make this more difficult. It seems most likely that a world of only closed models results in a small number of big companies plus our geopolitical adversaries having access to leading models, while startups, universities, and small businesses miss out on opportunities. Plus, constraining American innovation to closed development increases the chance that we don’t lead at all. Instead, I think our best strategy is to build a robust open ecosystem and have our leading companies work closely with our government and allies to ensure they can best take advantage of the latest advances and achieve a sustainable first-mover advantage over the long term.

When you consider the opportunities ahead, remember that most of today’s leading tech companies and scientific research are built on open source software. The next generation of companies and research will use open source AI if we collectively invest in it. That includes startups just getting off the ground as well as people in universities and countries that may not have the resources to develop their own state-of-the-art AI from scratch.

The bottom line is that open source AI represents the world’s best shot at harnessing this technology to create the greatest economic opportunity and security for everyone.

Let’s Build This Together

With past Llama models, Meta developed them for ourselves and then released them, but didn’t focus much on building a broader ecosystem. We’re taking a different approach with this release. We’re building teams internally to enable as many developers and partners as possible to use Llama, and we’re actively building partnerships so that more companies in the ecosystem can offer unique functionality to their customers as well.

I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here. I hope you’ll join us on this journey to bring the benefits of AI to everyone in the world.

You can access the models now at llama.meta.com.

bnew · Jul 24, 2024

1/11
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet.

Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation.

The models are available to download now directly from Meta or @huggingface. With today’s release the ecosystem is also ready to go with 25+ partners rolling out our latest models — including @awscloud, @nvidia, @databricks, @groqinc, @dell, @azure and @googlecloud ready on day one.

More details in the full announcement

Introducing Llama 3.1: Our most capable models to date
Download Llama 3.1 models

Llama 3.1

With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest models will unlock across all levels of the AI community.

2/11
Training a model as large and capable as Llama 3.1 405B was no simple task. The model was trained on over 15 trillion tokens over the course of several months requiring over 16K @NVIDIA H100 GPUs — making it the first Llama model ever trained at this scale.

We also used the 405B parameter model to improve the post-training quality of our smaller models.

3/11
With Llama 3.1, we evaluated performance on >150 benchmark datasets spanning a wide range of languages — in addition to extensive human evaluations in real-world scenarios. These results show that the 405B competes with leading closed source models like GPT-4, Claude 2 and Gemini Ultra across a range of tasks.
Our upgraded Llama 3.1 8B & 70B models are also best-in-class, outperforming other models at their size while also delivering a better balance of helpfulness and safety than their predecessors. These smaller models support the same improved 128K token context window, multilinguality, improved reasoning and state-of-the-art tool use to enable more advanced use cases.

4/11
We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.

We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source.

5/11
As Mark Zuckerberg shared in an open letter this morning: we believe that open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn't concentrated in the hands of a small few, and that the technology can be deployed more evenly and safely across society.

That’s why we continue to take steps on the path for open source AI to become the industry standard.

Read the letter

Open Source AI Is the Path Forward | Meta

6/11
Congratulations on the release @AIatMeta! Thanks for your unwavering support for Open Source

I put down some notes from the release below!

7/11
Open source AI is the path forward.

8/11
What a great way to start a Tuesday morning! Super excited for this partnership

Check out the whole Llama 3.1 herd on OctoAI https://octoai.cloud/text

9/11
Time to build!

10/11
Awesome research and progress towards open source AGI!!

11/11
Really awesome work!!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 24, 2024

1/11
Super excited to announce Mistral Large 2
- 123B params - fits on a single H100 node
- Natively Multilingual
- Strong code & reasoning
- SOTA function calling
- Open-weights for non-commercial usage

Blog: Large Enough
Weights: mistralai/Mistral-Large-Instruct-2407 · Hugging Face

1/N

2/11
Code & Reasoning
- Trained with 80+ programming languages
- Mistral Large 2 (123B) comparable to GPT-4o, Opus-3 and Llama-3 405B at coding benchmarks
- As compared to Mistral Large 1, significantly reduced hallucinations, improved reliability

2/N

3/11
Instruction Following
Mistral Large 2 is particularly better at following precise instructions and handling long multi-turn conversations.
On Wild Bench, Arena Hard & MT Bench:
- Outperforms Llama 3.1 405B & Opus-3
- Comparable to Sonnet-3.5 and GPT-4o

3/N

4/11
Multilinguality
- Mistral Large 2 is trained on a large proportion of multilingual data.
- Excels in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
- Outperforms Llama-3.1 70B, Comparable to Llama-3.1 405B

4/N

5/11
Tool Use & Function Calling
Mistral Large 2 is equipped with enhanced function calling and retrieval skills and has undergone training to proficiently execute both parallel and sequential function call. It achieves state-of-the-art function calling performance.

5/N

6/11
Mistral Large 2 - Access and Customization
- Available today on La Plateforme and Le Chat
- Available on Google Vertex AI, Azure AI Studio, Amazon Bedrock & IBM Watsonx
- Also available for fine-tuning on La Plateforme along with Mistral Nemo 12B and Codestral 22B

6/N

7/11
So uhhh..... @togethercompute could you get this one too

Looking forward to checking it out.

8/11

Wow, wow. Can't wait to see Mistral Large 2 in action! Its performance in better function calling than GPT-4o is impressive.

9/11
yeah , rock & roll !

10/11
Awesome!

11/11
Open source comeback week!!!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/9
Mistral just dropped Large 123B - Multilingual (11 languages), 128K context, trained on 80+ coding lang! Scores close to Meta Llama 405B

Some notes from the blogpost and the model card:

> MMLU - 84.0% vs 79.3% (70B) vs 85.2% (405B)
> HumanEval - 92% vs 80.5% (70B Ins) vs 89% (405B Ins)
> GSM8K - 93% vs 95.5% (70B Ins) vs 96.8% (405B Ins)

> Multilingual: English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish.
> Trained on 80+ coding languages specially swift + fortran. Looking strong

> Supports native function calling + structured output.
> Released under Mistral Research License (Non-Commercial)

> Integrated with Transformers

GPU requirements:

fp16/ bf16 - ~250GB VRAM
fp8/ int8 - ~125GB VRAM
int4 - ~60GB VRAM

GG Mistral, deffo looks impressive, especially the coding abilities!

2/9
Model checkpoint:

mistralai/Mistral-Large-Instruct-2407 · Hugging Face

3/9
Integrated w/ Transformers!

4/9
July seems to be a great month for open source!!

5/9
and.. it's not over yet!

6/9
Yay

Awesome news for Open Source models

7/9
Amazing performance

8/9
Absolutely amazing

9/9

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
We're thrilled to expand our curated portfolio of models with the addition of @MistralAI's latest LLMs to Vertex AI Model Garden, generally available today via Model-as-a-service (MaaS):

1) Codestral
2) Mistral Large v2
3) Mistral Nemo

Learn more → Codestral and Mistral Large V2 on Vertex AI | Google Cloud Blog

2/2

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 24, 2024

1/2
RFNet is a training-free approach that bring better prompt understanding to image generation.

Adding support for prompt reasoning, conceptual and metaphorical thinking, imaginative scenarios and more.

The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

2/2
Too bad, not code released yet. Would be interesting to see how it compares to ELLA

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 24, 2024

1/11
We're excited to announce the release of the research paper for Stable Audio Open!

This open-weight text-to-audio model generates high-quality stereo audio at 44.1kHz from text prompts. Perfect for synthesizing realistic sounds and field recordings, it runs on consumer-grade GPUs, making it accessible for academic and artistic use.

Learn more here: Stable Audio Open: Research Paper — Stability AI

2/11
thieves

3/11
Remarkable innovation! Replicating realistic audio from text inputs revolutionizes accessibility. How might this empower content creators?

4/11
@360

5/11
Wow, the diffusion model in the autoencoder works really good for audio and music, as it achieves these high parameters

6/11
This is crazy

7/11
Imagine if you didn't fire all the competent people working on SD3!

8/11
GenAiSongs

9/11
@MattVidPro

10/11
Awesome!

11/11
Oh boy, how much did you steal this time?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
The paper presents a new open-weight text-to-audio model developed by Stability AI, highlighting its architecture and training with Creative Commons data. The model, a latent diffusion type, can generate variable-length stereo audio at 44.1kHz and competes wit...

2/2
Stable Audio Open

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/6
Stable Audio Open

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon.

2/6
Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the

3/6
reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

4/6
paper page: Paper page - Stable Audio Open

5/6
daily papers: Daily Papers - Hugging Face

6/6
Oh dang.

I'm mentioned in this paper.

That basically makes me famous now.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 24, 2024

1/9
We've already served half a million huggingchat requests to Llama 405B in the last 24 hours

Try it out!

2/9
Count me as request #500,001.

Forgot to put mine in yesterday.

3/9
Try it out? Say no more

4/9
Hey I was wondering if the data is saved and released from HuggingChat? Seems like it could be a useful dataset for people to train models with

5/9
How do you guys afford this level of inference?

6/9
That's crazy! Half a million huggingchat requests in 24 hours? What's the secret to Llama 405B's success?

7/9
I’m surprised this thing is still responding to be honest.

8/9
so no need to worry about claude rate limits! lots of love to hf and meta

9/9
You, as HF team, are legend man

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
Quantized Llama3.1-405B

It was a pleasure to collaborate with @AIatMeta, @vllm_project to release the official FP8 version of Llama3.1-405B !
The HF checkpoint is compatible with transformers, TGI, VLLM from day 0 !

Model: meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 · Hugging Face

Why is it good ?

2/11
We have created other quants (AWQ/GPTQ/BNB) for all model sizes here: Llama 3.1 GPTQ, AWQ, and BNB Quants - a hugging-quants Collection
credits to @alvarobartt @reach_vb @xenovacom

3/11
We have also released a bunch of notebook to show you how to run these models easily with transformers: GitHub - huggingface/huggingface-llama-recipes

4/11
Now, let's come back to the FP8 model. A lot of things were done in order to optimize the accuracy and the speed of the model.

5/11
First, they decided to leverage dynamic scaling factors for better accuracy and optimize the kernels to reduce the overhead of calculating the scales.

6/11
Second, they introduced a static upper bound to cap the dynamic scaling factors. This makes the model more robust again outliers that they observed in some rare prompts. To calibrate this value, they used a diverse set of datasets.

7/11
The kernels used to perform the quantization and the inference are the fbgemm-gpu kernels. Check transformers integration for more details: Add new quant method by SunMarc · Pull Request #32047 · huggingface/transformers

8/11
Third, they opted for row-wise quantization, which computes the scale across the rows for the weights and
across the tokens for activations (A8W8). Their experiments shows that this method is the one that preserves the best the accuracy.

9/11
Fourth, FP8 quantization was only applied to the major linear operators of the model, such as the gate and up and down projections for the FFNs (covering 75% of the inference FLOPs) in order to reduce the accuracy degradation.

10/11
Self-attention FP8 quantization was skipped due to its negative impact on model accuracy. Additionally, they didn't quantize the linear layers in the attention blocks, as they account for less than 5% of the total FLOPs.

11/11
Finally, we end up with a FP8 model for the 405B model that takes around 480GB that can be run on a single node 8*H100. Enjoy !

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
Curious about how Llama3.1 stacks up against its predecessor Llama3? We've got you covered!

Check out our community Space for a hands-on vibe-check on 8b model and see the differences for yourself

2/2
Meta Llama3.1 8B V/s Meta Llama3 8B

A Gradio chatbot playground for Llama3.1 8b and Llama3 8b LLMs. Access the demo here: Llama3.1 Vs Llama3 - a Hugging Face Space by Gradio-Community

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Large Language Models News & Discussions

Veteran

Foundational Autoraters - Taming Large Language Models for Better Automatic Evaluation​

Submission history​

Veteran

Microsoft’s new AI system ‘SpreadsheetLLM’ unlocks insights from spreadsheets, boosting enterprise productivity​

Unlocking AI-assisted data analysis and decision-making​

Microsoft’s growing investment in enterprise AI​

Veteran

Mistral releases Codestral Mamba for faster, longer code generation​

Mathstral is suited for STEM use cases​

Veteran

Scoop: Meta won't offer future multimodal AI models in EU​

Veteran

OpenAI used a game to help AI models explain themselves better​

The Prover-Verifier Game and how it works​

How OpenAI implemented the Prover-Verifier Game to make AI better at explaining itself​

Veteran

Google brings AI agent platform Project Oscar open source​

Veteran

Veteran

Veteran

Open Source AI Is the Path Forward​

​

Why Open Source AI Is Good for Developers​

​

Why Open Source AI Is Good for Meta​

Veteran

​

Why Open Source AI Is Good for the World​

​

Let’s Build This Together​

Veteran

Veteran

Veteran

Veteran

Veteran

Foundational Autoraters - Taming Large Language Models for Better Automatic Evaluation

Submission history

Microsoft’s new AI system ‘SpreadsheetLLM’ unlocks insights from spreadsheets, boosting enterprise productivity

Unlocking AI-assisted data analysis and decision-making

Microsoft’s growing investment in enterprise AI

Mistral releases Codestral Mamba for faster, longer code generation

Mathstral is suited for STEM use cases

Scoop: Meta won't offer future multimodal AI models in EU

OpenAI used a game to help AI models explain themselves better

The Prover-Verifier Game and how it works

How OpenAI implemented the Prover-Verifier Game to make AI better at explaining itself

Google brings AI agent platform Project Oscar open source

Open Source AI Is the Path Forward

Why Open Source AI Is Good for Developers

Why Open Source AI Is Good for Meta

Why Open Source AI Is Good for the World

Let’s Build This Together