Meta's newest 'open' AI model release is its biggest yet. The company claims the model, Llama 3.1 405B, is competitive with the best commercial releases.
techcrunch.com
Meta releases its biggest ‘open’ AI model yet
Kyle Wiggers
8:00 AM PDT • July 23, 2024
Comment
Image Credits: TOBIAS SCHWARZ/AFP / Getty Images
Meta’s latest open source AI model is its biggest yet.
Today, Meta said it is releasing Llama 3.1 405B, a model containing 405 billion parameters. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.
At 405 billion parameters, Llama 3.1 405B isn’t the absolute
largest open source model out there, but it’s the biggest in recent years. Trained using 16,000 Nvidia H100 GPUs, it also benefits from newer training and development techniques that Meta claims makes it competitive with leading proprietary models like OpenAI’s
GPT-4o and Anthropic’s
Claude 3.5 Sonnet (with a few caveats).
As with Meta’s previous models, Llama 3.1 405B is available to download or use on cloud platforms like AWS, Azure and Google Cloud. It’s also being used on WhatsApp and Meta.ai, where it’s
powering a chatbot experience for U.S.-based users.
New and improved
Like other open and closed source generative AI models, Llama 3.1 405B can perform a range of different tasks, from coding and answering basic math questions to summarizing documents in eight languages (English, German, French, Italian, Portuguese, Hindi, Spanish and Thai). It’s text-only, meaning that it can’t, for example, answer questions about an image, but most text-based workloads — think analyzing files like PDFs and spreadsheets — are within its purview.
Meta wants to make it known that it is experimenting with multimodality. In a paper published today, researchers at the company write that they’re actively developing Llama models that can recognize images and videos, and understand (and generate) speech. Still, these models aren’t yet ready for public release.
To train Llama 3.1 405B, Meta used a dataset of 15 trillion tokens dating up to 2024 (tokens are parts of words that models can more easily internalize than whole words, and 15 trillion tokens translates to a mind-boggling 750 billion words). It’s not a new training set per se, since Meta used the base set to train earlier Llama models, but the company claims it refined its curation pipelines for data and adopted “more rigorous” quality assurance and data filtering approaches in developing this model.
The company also used synthetic data (data generated by
other AI models) to fine-tune Llama 3.1 405B. Most major AI vendors, including OpenAI and Anthropic, are exploring applications of synthetic data to scale up their AI training, but some experts
believe that synthetic data should be a
last resort due to its potential to exacerbate model bias.
For its part, Meta insists that it “carefully balance[d]” Llama 3.1 405B’s training data, but declined to reveal exactly where the data came from (outside of webpages and public web files). Many generative AI vendors see training data as a competitive advantage and so keep it and any information pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive for companies to reveal much.
Image Credits:Meta
In the aforementioned paper, Meta researchers wrote that compared to earlier Llama models, Llama 3.1 405B was trained on an increased mix of non-English data (to improve its performance on non-English languages), more “mathematical data” and code (to improve the model’s mathematical reasoning skills), and recent web data (to bolster its knowledge of current events).
Recent reporting by Reuters revealed that Meta at one point used copyrighted e-books for AI training despite its own lawyers’ warnings. The company controversially trains its AI on Instagram and Facebook posts, photos and captions, and
makes it difficult for users to opt out. What’s more, Meta, along with OpenAI, is the subject of an ongoing lawsuit brought by authors, including comedian Sarah Silverman, over the companies’ alleged unauthorized use of copyrighted data for model training.
“The training data, in many ways, is sort of like the secret recipe and the sauce that goes into building these models,” Ragavan Srinivasan, VP of AI program management at Meta, told TechCrunch in an interview. “And so from our perspective, we’ve invested a lot in this. And it is going to be one of these things where we will continue to refine it.”
Bigger context and tools
Llama 3.1 405B has a larger context window than previous Llama models: 128,000 tokens, or roughly the length of a 50-page book. A model’s context, or context window, refers to the input data (e.g. text) that the model considers before generating output (e.g. additional text).
One of the advantages of models with larger contexts is that they can summarize longer text snippets and files. When powering chatbots, such models are also less likely to forget topics that were recently discussed.
Two other new, smaller models Meta unveiled today, Llama 3.1 8B and Llama 3.1 70B — updated versions of the company’s Llama 3 8B and Llama 3 70B models released in April — also have 128,000-token context windows. The previous models’ contexts topped out at 8,000 tokens, which makes this upgrade fairly substantial —
assuming the new Llama models can effectively reason across all that context.
Image Credits:Meta
All of the Llama 3.1 models can use third-party tools, apps and APIs to complete tasks, like rival models from Anthropic and OpenAI. Out of the box, they’re trained to tap Brave Search to answer questions about recent events, the Wolfram Alpha API for math- and science-related queries, and a Python interpreter for validating code. In addition, Meta claims the Llama 3.1 models can use certain tools they haven’t seen before — to an extent.