bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871



1/3
For the last two years, my team and I have been publicly working on laying the foundations of early-fusion, multi-modal (MM) token-in token-out approaches, from the original CM3 paper to MM-scaling laws to CM3Leon to half a dozen or so more papers all around space, to a couple more coming out soon.

2/3
While it's true we're behind, we're much closer to OpenAI than when GPT-4 launched. We've built recipes that scale, architectures aligned with multi-modality, science on how to train these models, and, most importantly, the strongest team outside of OpenAI in this research space.

3/3
I firmly believe in ~2 months, there will be enough knowledge in the open-source for folks to start pre-training their own gpt4o-like models. We're working hard to make this happen.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNgyBPAWgAAAtDl.jpg

GNgyBSNWMAAB8x0.jpg

GNfC-J3bMAERqt3.png
 

Micky Mikey

Veteran
Supporter
Joined
Sep 27, 2013
Messages
15,857
Reputation
2,865
Daps
88,365
The new update is nice. Still waiting on that giant leap in intelligence and reasoning capabilities. That will be the real game changer.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

GPT-4o first reactions: ‘essentially AGI’​

Carl Franzen @carlfranzen

May 13, 2024 3:35 PM

AI vector art image of crowd of diverse people gathered around a convention center exterior with sign reading GPT-4o

Credit: VentureBeat made with Midjourney V6

Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.



It’s only been a few hours since OpenAI took the wrapper off its newest AI large language model (LLM) known as GPT-4o (for Omni), but already the initial reactions to the event and the technology are rolling in.

It’s safe to say that at this early stage, the reaction is mixed. While some came away from OpenAI’s short (26 min-long) demo presentation wanting more, the company has since released a plethora of video demos and more information about the new foundation model — which it says is faster than its prior leading GPT-4 model, more affordable for third-party developers, and, perhaps most important of all, more emotional: better at detecting and mimicking human expressions, principally through audio.

It’s also free to use through ChatGPT for all users, even non-subscribers — though paying subscribers are getting ahold of it first (the update is rolling out over the coming weeks). It’s also only starting with text and vision capabilities, with audio and video coming in the next few weeks.

GPT-4o has been trained from the ground up to treat text, audio, and visual data equally and transform it all into tokens instead of relying on turning everything into text as before, allowing for the speed increase and cost decrease.

“OpenAI is eating Character AI’s lunch, with almost 100% overlap in form factor and huge distribution channels,” wrote Nvidia senior research manager and AI influencer Jim Fan on X. “It’s a pivot towards more emotional AI with strong personality, which OpenAI seemed to actively suppress in the past.”



“GPT-4o isn’t the big leap. This is,” stated Pennsylvania University Wharton School of Business professor and AI influencer Ethan Mollick.



AI influencer and startup advisor Allie K. Miller was excited about the new desktop ChatGPT app for macOS (later Windows) which runs on GPT-4o. As she wrote on X:

“HOLY MOLY THIS IS THE WINNING FEATURE. It’s basically a coworker on screen share with you 24/7, with no fatigue. I can imagine people working for hours straight with this on.”



AI developer Benjamin De Kraker wrote that he believed it was essentially artificial general intelligence (AGI), an AI that outperforms most humans and most economically valuable tasks, which was OpenAI’s entire quest and raison d’être from the start.

“Alright I’m gonna say it… This is essentially AGI. This will be seen as magic to masses. What else do you call it when a virtual “person” can listen, talk, see, and reason almost indistinguishably from an average human?”



Developer Siqi Chen was similarly impressed, citing GPT-4o’s newfound capability to render 3D objects from text prompts, writing on X that “this will prove to be in retrospect by far the most underrated openai event ever.”



On the flip side, journalist and author James Vincent stated that while the marketing for GPT-4o as a voice assistant was “canny,” it was ultimately “leaning into the masquerade of intelligence” as “voice…doesn’t necessarily indicate leaps forward in capability.”



Similarly, Gartner VP Chirag Dekate of the reputable market research and consulting firm’s quantum technologies, AI infrastructures and supercomputing department, told VentureBeat in a phone call that he found the GPT-4o unveiling event and tech demos “a bit underwhelming, because it reminded me of the Gemini demos that I’ve already seen almost three months ago from Google.”

He stated he believed that there was a growing “capability gap” emerging between OpenAI and other more longstanding technology companies such as Google, Meta, and even OpenAI ally Microsoft which is also training its own LLMs and foundation models, as those companies have more raw data to train new models on and more distribution channels to push them out, as well as cloud infrastructure and hardware (the Tensor Processing Unit or TPU in the case of Google) to optimize AI training and inferences.

“Open AI will struggle to activate the same sort of virtuous cycles,” with its AI products, Dekate told VentureBeat.

Self-described luddite (or anti-technology) influencer “Artisanal Holdout” posted on X the most scathing response I saw (unsurprisingly):

“Yikes—OpenAI balked on GPT-5 and instead released GPT-4o over a year after the initial launch of GPT-4. Guess they aren’t confident enough in their tiny baby steps of development. How embarrassing for both OpenAI and AI bros alike.”



Meanwhile, Greg Isenberg, CEO of holding company Late Checkout, wrote on X the opposite opinion that “The pace of change is unbelievable.”



AI educator Min Choi also applauded the release, saying it would “completely change the AI assistant game.”



Out for less than a day so far, and with many capabilities still to reach the public, GPT-4o is still a very young product. But given the already impassioned responses, it’s clear OpenAI has already struck a nerve.

VentureBeat has also received access via my personal account and will be testing the new model in the coming days. Stay tuned for our reaction once we have had more time to try it.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

OpenAI president shares first image generated by GPT-4o​

Carl Franzen @carlfranzen

May 15, 2024 3:56 PM

A person wearing a black T-shirt with an OpenAI logo using a cloth to wipe a blackboard with chalk text on it

Credit: Greg Brockman/X

Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.



OpenAI’s president Greg Brockman has posted from his X account what appears to be the first public image generated using the company’s brand new GPT-4o model.

As you’ll see in the image below, it is quite convincingly photorealistic, showing a person wearing a black T-shirt with an OpenAI logo writing chalk text on a blackboard that reads “Transfer between Modalities. Suppose we directly model P (text, pixels, sound) with one big autoregressive transformer. What are the pros and cons?”



1/1

A GPT-4o generated image — so much to explore with GPT-4o's image generation capabilities alone. Team is working hard to bring those to the world.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GNptva7W4AAXFjM.jpg

The new GPT-4o model, which debuted on Monday, improves upon the prior GPT-4 family of models (GPT-4, GPT-4 Vision, and GPT-4 Turbo) by being faster, cheaper, and retaining more information from inputs such as audio and vision.

It is able to do so because OpenAI took a different approach from its prior GPT-4 class LLMs. While those chained multiple different models together and converted other media such as audio and visuals to text and back, the new GPT-4o was trained on multimedia tokens from the get-go, allowing it to directly analyze and interpret vision and audio without first converting it into text.

Based on the above image, the new approach is a noticeable improvement over OpenAI’s last image generation model DALL-E 3 which debuted in September 2023. I ran a similar prompt through DALL-E 3 in ChatGPT and here is the result.

open-ai-dall-3-eg.png

As you can see, the image shared by Brockman created with GPT-4o improves significantly in quality, photorealism, and accuracy of text generation.

However, GPT-4o’s native image generation capabilities are not yet publicly available. As Brockman alluded to in his X post by saying “Team is working hard to bring those to the world.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

Hugging Face is sharing $10 million worth of compute to help beat the big AI companies​


ZeroGPU gives everyone the chance to create AI apps without the burden of GPU costs.​

By Kylie Robison, a senior AI reporter working with The Verge's policy and tech teams. She previously worked at Fortune Magazine and Business Insider.

May 16, 2024, 9:00 AM EDT

3 Comments

Photo illustration of Clément Delangue of Hugging Face in front of the Hugging Face logo.

Image: The Verge / Getty Images

Hugging Face, one of the biggest names in machine learning, is committing $10 million in free shared GPUs to help developers create new AI technologies. The goal is to help small developers, academics, and startups counter the centralization of AI advancements.

“We are lucky to be in a position where we can invest in the community,” Hugging Face CEO Clem Delangue told The Verge. Delangue said the investment is possible because Hugging Face is “profitable, or close to profitable” and recently raised $235 million in funding, valuing the company at $4.5 billion.

Delangue is concerned about AI startups’ ability to compete with the tech giants. Most significant advancements in artificial intelligence — like GPT-4, the algorithms behind Google Search, and Tesla’s Full Self-Driving system — remain hidden within the confines of major tech companies. Not only are these corporations financially incentivized to keep their models proprietary, but with billions of dollars at their disposal for computational resources, they can compound those gains and race ahead of competitors, making it impossible for startups to keep up.

“If you end up with a few organizations who are dominating too much, then it’s going to be harder to fight it later on.”

Hugging Face aims to make state-of-the-art AI technologies accessible to everyone, not just the tech giants. I spoke with Delangue during Google I/O, the tech giant’s flagship conference, where Google executives unveiled numerous AI features for their proprietary products and even a family of open-source models called Gemma. For Delangue, the proprietary approach is not the future he envisions.

“If you go the open source route, you go towards a world where most companies, most organizations, most nonprofits, policymakers, regulators, can actually do AI too. So, a much more decentralized way without too much concentration of power which, in my opinion, is a better world,” Delangue said.

How it works​

Access to compute poses a significant challenge to constructing large language models, often favoring companies like OpenAI and Anthropic, which secure deals with cloud providers for substantial computing resources. Hugging Face aims to level the playing field by donating these shared GPUs to the community through a new program called ZeroGPU.

The shared GPUs are accessible to multiple users or applications concurrently, eliminating the need for each user or application to have a dedicated GPU. ZeroGPU will be available via Hugging Face’s Spaces, a hosting platform for publishing apps, which has over 300,000 AI demos created so far on CPU or paid GPU, according to the company.

“It’s very difficult to get enough GPUs from the main cloud providers”

Access to the shared GPUs is determined by usage, so if a portion of the GPU capacity is not actively utilized, that capacity becomes available for use by someone else. This makes them cost-effective, energy-efficient, and ideal for community-wide utilization. ZeroGPU uses Nvidia A100 GPU devices to power this operation — which offer about half the computation speed of the popular and more expensive H100s.

“It’s very difficult to get enough GPUs from the main cloud providers, and the way to get them—which is creating a high barrier to entry—is to commit on very big numbers for long periods of times,” Delangue said.

Typically, a company would commit to a cloud provider like Amazon Web Services for one or more years to secure GPU resources. This arrangement disadvantages small companies, indie developers, and academics who build on a small scale and can’t predict if their projects will gain traction. Regardless of usage, they still have to pay for the GPUs.

“It’s also a prediction nightmare to know how many GPUs and what kind of budget you need,” Delangue said.

Open-source AI is catching up​

With AI rapidly advancing behind closed doors, the goal of Hugging Face is to allow people to build more AI tech in the open.

“If you end up with a few organizations who are dominating too much, then it’s going to be harder to fight it later on,” Delangue said.

Andrew Reed, a machine learning engineer at Hugging Face, even spun up an app that visualizes the progress of proprietary and open-source LLMs over time as scored by the LMSYS Chatbot Arena, which shows the gap between the two inching closer together.

Over 35,000 variations of Meta’s open-source AI model Llama have been shared on Hugging Face since Meta’s first version a year ago, ranging from “quantized and merged models to specialized models in biology and Mandarin,” according to the company.

“AI should not be held in the hands of the few. With this commitment to open-source developers, we’re excited to see what everyone will cook up next in the spirit of collaboration and transparency,” Delangue said in a press release.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

May 16, 2024

OpenAI and Reddit Partnership​

We’re bringing Reddit’s content to ChatGPT and our products.

Media: OpenAI and Reddit Partnership

Editor’s Note: This post was originally published by Reddit(opens in a new window).

Keeping the internet open is crucial, and part of being open means Reddit content needs to be accessible to those fostering human learning and researching ways to build community, belonging, and empowerment online. Reddit is a uniquely large and vibrant community that has long been an important space for conversation on the internet. Additionally, using LLMs, ML, and AI allow Reddit to improve the user experience for everyone.

In line with this, Reddit and OpenAI today announced a partnership to benefit both the Reddit and OpenAI user communities in a number of ways:

  • OpenAI will bring enhanced Reddit content to ChatGPT and new products, helping users discover and engage with Reddit communities. To do so, OpenAI will access Reddit’s Data API, which provides real-time, structured, and unique content from Reddit. This will enable OpenAI’s AI tools to better understand and showcase Reddit content, especially on recent topics.
  • This partnership will also enable Reddit to bring new AI-powered features to redditors and mods. Reddit will be building on OpenAI’s platform of AI models to bring its powerful vision to life.
  • Lastly, OpenAI will become a Reddit advertising partner.

"We are thrilled to partner with Reddit to enhance ChatGPT with uniquely timely and relevant information, and to explore the possibilities to enrich the Reddit experience with AI-powered features.”

Brad Lightcap, OpenAI COO

“Reddit has become one of the internet’s largest open archives of authentic, relevant, and always up to date human conversations about anything and everything. Including it in ChatGPT upholds our belief in a connected internet, helps people find more or what they’re looking for, and helps new audiences find community on Reddit.”

Steve Huffman, Reddit Co-Founder and CEO

OpenAI Disclosure: Sam Altman is a shareholder in Reddit. This partnership was led by OpenAI’s COO and approved by its independent Board of Directors.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

1/1
Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models.

This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence.

Paper [2405.09818] Chameleon: Mixed-Modal Early-Fusion Foundation Models


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNvYhCOaEAAxZys.jpg



Computer Science > Computation and Language​

[Submitted on 16 May 2024]

Chameleon - Mixed-Modal Early-Fusion Foundation Models​

Chameleon Team
We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2405.09818 [cs.CL]
(or arXiv:2405.09818v1 [cs.CL] for this version)

Submission history

From: Armen Aghajanyan [view email]
[v1] Thu, 16 May 2024 05:23:41 UTC (26,721 KB)

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

OpenAI Reportedly Dissolves Its Existential AI Risk Team​


A former lead scientist at OpenAI says he's struggled to secure resources to research existential AI risk, as the startup reportedly dissolves his team.​

By

Maxwell Zeff

Publishedan hour ago

Comments (1)

Image for article titled OpenAI Reportedly Dissolves Its Existential AI Risk Team

Photo: Kent Nishimura (Getty Images)

OpenAI’s Superalignment team, charged with controlling the existential danger of a superhuman AI system, has reportedly been disbanded, according to Wired on Friday. The news comes just days after the team’s founders, Ilya Sutskever and Jan Leike, simultaneously quit the company.

Wired reports that OpenAI’s Superalignment team, first launched in July 2023 to prevent superhuman AI systems of the future from going rogue, is no more. The report states that the group’s work will be absorbed into OpenAI’s other research efforts. Research on the risks associated with more powerful AI models will now be led by OpenAI cofounder John Schulman, according to Wired. Sutskever and Leike were some of OpenAI’s top scientists focused on AI risks.

Leike posted a long thread on X Friday vaguely explaining why he left OpenAI. He says he’s been fighting with OpenAI leadership about core values for some time, but reached a breaking point this week. Leike noted the Superaligment team has been “sailing against the wind,” struggling to get enough compute for crucial research. He thinks that OpenAI needs to be more focused on security, safety, and alignment.



“Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue,” said the Superalignment team in an OpenAI blog post when it launched in July. “But humans won’t be able to reliably supervise AI systems much smarter than us, and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.”

It’s now unclear if the same attention will be put into those technical breakthroughs. Undoubtedly, there are other teams at OpenAI focused on safety. Schulman’s team, which is reportedly absorbing Superalignment’s responsibilities, is currently responsible for fine-tuning AI models after training. However, Superalignment focused specifically on the most severe outcomes of a rogue AI. As Gizmodo noted yesterday, several of OpenAI’s most outspoken AI safety advocates have resigned or been fired in the last few months.

Earlier this year, the group released a notable research paper about controlling large AI models with smaller AI models—considered a first step towards controlling superintelligent AI systems. It’s unclear who will make the next steps on these projects at OpenAI.

OpenAI did not immediately respond to Gizmodo’s request for comment.



Sam Altman’s AI startup kicked off this week by unveiling GPT-4 Omni, the company’s latest frontier model which featured ultra-low latency responses that sounded more human than ever. Many OpenAI staffers remarked on how its latest AI model was closer than ever to something from science fiction, specifically the movie Her.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871


1/2
Today we have published our updated Gemini 1.5 Model Technical Report. As
@JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra.

As a math undergrad, our drastic results in mathematics are particularly exciting to me!

In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below ).

Gemini 1.5 is widely available, try it out for free here Google AI Studio | Google AI for Developers | Google for Developers & read the full tech report here: https://goo.gle/GeminiV1-5

2/2
Here are some examples of the model solving problems from the Asian Pacific Mathematical Olympiad (APMO) that has stumped prior models. The top example is cool because it is a proof (rather than a calculation). The solutions are to the point and "beautiful".

This clearly shows


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNy6M67XEAASxPR.jpg

GNzCf36XQAA_wte.jpg

GNoERvXWEAAvJKA.jpg



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871



1/3
it is interesting that GPT4-o's ELO is lower at 1287, than its initial 1310 score.
On coding, it regressed even more absolute points, from 1369 to 1307.

2/3
i wonder how much of the differential was because people were really trend-bubbling "im-a-good-gpt2" thing, and trying to spot and validate it; and now its more normalized to regular expectations.

3/3
someone pointed out that in fact in coding, GPT-4o did worse than GPT-4Turbo in medium/hard problems, but did better on easy problems on LiveCodeBench Leaderboard


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNy1cMMXcAILDFl.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871


1/1
Spoke to @geoffreyhinton about OpenAI co-founder @ilyasut's intuition for scaling laws.

"Ilya was always preaching that you just make it bigger and it'll work better.

And I always thought that was a bit of a cop-out, that you're going to have to have new ideas too.

It turns out Ilya was basically right."

Link to full interview in .




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GNyS__AWQAASP5C.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871









1/9
How do models like GPT-4o and Meta’s Chameleon generate images?

Answer: They don’t, they generate tokens.

A short thread on multimodal tokenizers:

2/9
Text-only LLMs operate on a discrete, learned vocabulary of tokens that's fixed throughout training.
Every training example asks the neural net to predict the next text token id given previous text token ids.

3/9
End-to-End Multimodal LLMs like GPT-4o and Meta’s Chameleon incorporate ‘image tokens’ as additional tokens into the vocabulary.

For example, Chameleon has 65k text tokens and 8k image tokens for a total of ~73k tokens in the vocabulary.

But wait? How can you encode every

4/9
You don’t! Each image token isn’t a collection of pixels - it’s a vector. For example:

Text Token with id 104627 might be: _SolidGoldMagikarp

While the image token with index 1729 will be an embedding: [1.232, -.21, … 0.12]

The LLMs task is to learn patterns in sequences of

5/9
An embedding vector?

Image vocabularies in multimodal language models are actually hidden states of a neural network called a vector quantized variational auto-encoder or VQ-VAE

These neural networks are trained to compress images into a set of small set of codes (tokens).

6/9
When you tokenize the input of a multimodal LLM, any image runs through the encoder of the VAE to generate a set of latent codes. You then lookup the vectors representing these codes and use them as the input to the LLMs transformer decoder.

During generation, the multimodal

7/9
If you like thinking about stuff like this, they'll you'll enjoy engineering at
@nomic_ai .

We help humans understand and build with unstructured data through latent space powered tools.

That involves training and serving multimodal models every day to our customers!

Many of

8/9
Diagram credit:
Chameleon: https://arxiv.org/pdf/2405.09818
VQ-VAE: https://arxiv.org/pdf/2203.13131

9/9
commercially available models only allow you to generate out text due to the difficulty of moderating images.

but yes, yes they do. you'll probably never be given it because they will instead give you the raw image back


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNwPcl_a4AABY_a.jpg

GNwUE-maEAAoWVj.jpg

GNwQLqjb0AAhYa-.png

GNwQdbfasAAdxCr.jpg

GNwQ5V4asAArnIB.jpg

GNvZTMYbEAACSnF.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

GitHub - deepseek-ai/DeepSeek-V2

1. Introduction

GitHub - deepseek-ai/DeepSeek-V2

Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.





We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.

2. News

GitHub - deepseek-ai/DeepSeek-V2

2024.05.16: We released the DeepSeek-V2-Lite.

2024.05.06: We released the DeepSeek-V2.

3. Model Downloads

GitHub - deepseek-ai/DeepSeek-V2

Model#Total Params#Activated ParamsContext LengthDownload
DeepSeek-V2-Lite16B2.4B32k 🤗 HuggingFace
DeepSeek-V2-Lite-Chat (SFT)16B2.4B32k 🤗 HuggingFace
DeepSeek-V2236B21B128k 🤗 HuggingFace
DeepSeek-V2-Chat (RL)236B21B128k 🤗 HuggingFace

Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively.

4. Evaluation Results

GitHub - deepseek-ai/DeepSeek-V2

Base Model

GitHub - deepseek-ai/DeepSeek-V2

Standard Benchmark (Models larger than 67B)

GitHub - deepseek-ai/DeepSeek-V2

BenchmarkDomainLLaMA3 70BMixtral 8x22BDeepSeek-V1 (Dense-67B)DeepSeek-V2 (MoE-236B)
MMLUEnglish78.977.671.378.5
BBHEnglish81.078.968.778.9
C-EvalChinese67.558.666.181.7
CMMLUChinese69.360.070.884.0
HumanEvalCode48.253.145.148.8
MBPPCode68.664.257.466.6
GSM8KMath83.080.363.479.2
MathMath42.242.518.743.6

Standard Benchmark (Models smaller than 16B)

GitHub - deepseek-ai/DeepSeek-V2

BenchmarkDomainDeepSeek 7B (Dense)DeepSeekMoE 16BDeepSeek-V2-Lite (MoE-16B)
Architecture-MHA+DenseMHA+MoEMLA+MoE
MMLUEnglish48.245.058.3
BBHEnglish39.538.944.1
C-EvalChinese45.040.660.3
CMMLUChinese47.242.564.3
HumanEvalCode26.226.829.9
MBPPCode39.039.243.2
GSM8KMath17.418.841.1
MathMath3.34.317.1

For more evaluation details, such as few-shot settings and prompts, please check our paper.

Context Window

GitHub - deepseek-ai/DeepSeek-V2




Evaluation results on the

Code:
Needle In A Haystack

(NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to 128K.

Chat Model

GitHub - deepseek-ai/DeepSeek-V2

Standard Benchmark (Models larger than 67B)

GitHub - deepseek-ai/DeepSeek-V2

BenchmarkDomainQWen1.5 72B ChatMixtral 8x22BLLaMA3 70B InstructDeepSeek-V1 Chat (SFT)DeepSeek-V2 Chat (SFT)DeepSeek-V2 Chat (RL)
MMLUEnglish76.277.880.371.178.477.8
BBHEnglish65.978.480.171.781.379.7
C-EvalChinese82.260.067.965.280.978.0
CMMLUChinese82.961.070.767.882.481.6
HumanEvalCode68.975.076.273.876.881.1
MBPPCode52.264.469.861.470.472.0
LiveCodeBench (0901-0401)Code18.825.030.518.328.732.5
GSM8KMath81.987.993.284.190.892.2
MathMath40.649.848.532.652.753.9

Standard Benchmark (Models smaller than 16B)

GitHub - deepseek-ai/DeepSeek-V2

BenchmarkDomainDeepSeek 7B Chat (SFT)DeepSeekMoE 16B Chat (SFT)DeepSeek-V2-Lite 16B Chat (SFT)
MMLUEnglish49.747.255.7
BBHEnglish43.142.248.1
C-EvalChinese44.740.060.1
CMMLUChinese51.249.362.5
HumanEvalCode45.145.757.3
MBPPCode39.046.245.8
GSM8KMath62.662.272.0
MathMath14.715.227.9

English Open Ended Generation Evaluation

GitHub - deepseek-ai/DeepSeek-V2

We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.







Computer Science > Computation and Language​

[Submitted on 7 May 2024 (v1), last revised 16 May 2024 (this version, v3)]

DeepSeek-V2 - A Strong, Economical, and Efficient Mixture-of-Experts Language Model​

DeepSeek-AI
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:arXiv:2405.04434 [cs.CL]
(or arXiv:2405.04434v3 [cs.CL] for this version)
[2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Focus to learn more

Submission history

From: Wenfeng Liang [view email]
[v1] Tue, 7 May 2024 15:56:43 UTC (431 KB)
[v2] Wed, 8 May 2024 02:43:34 UTC (431 KB)
[v3] Thu, 16 May 2024 17:25:01 UTC (432 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

May 19, 2024

How the voices for ChatGPT were chosen​

We worked with industry-leading casting and directing professionals to narrow down over 400 submissions before selecting the 5 voices.

Asset > How the voices for ChatGPT were chosen

Voice Mode is one of the most beloved features in ChatGPT. Each of the five distinct voices you hear has been carefully selected through an extensive process spanning five months involving professional voice actors, talent agencies, casting directors, and industry advisors. We’re sharing more on how the voices were chosen.

In September of 2023, we introduced voice capabilities to give users another way to interact with ChatGPT. Since then, we are encouraged by the way users have responded to the feature and the individual voices. Each of the voices—Breeze, Cove, Ember, Juniper and Sky—are sampled from voice actors we partnered with to create them.


We support the creative community and collaborated with the voice acting industry​

We support the creative community and worked closely with the voice acting industry to ensure we took the right steps to cast ChatGPT’s voices. Each actor receives compensation above top-of-market rates, and this will continue for as long as their voices are used in our products.

We believe that AI voices should not deliberately mimic a celebrity's distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. To protect their privacy, we cannot share the names of our voice talents.


We partnered with award-winning casting directors and producers to create the criteria for voices​

In early 2023, to identify our voice actors, we had the privilege of partnering with independent, well-known, award-winning casting directors and producers. We worked with them to create a set of criteria for ChatGPT's voices, carefully considering the unique personality of each voice and their appeal to global audiences.

Some of these characteristics included:

  • Actors from diverse backgrounds or who could speak multiple languages
  • A voice that feels timeless
  • An approachable voice that inspires trust
  • A warm, engaging, confidence-inspiring, charismatic voice with rich tone
  • Natural and easy to listen to

We received over 400 submissions from voice and screen actors​

In May of 2023, the casting agency and our casting directors issued a call for talent. In under a week, they received over 400 submissions from voice and screen actors. To audition, actors were given a script of ChatGPT responses and were asked to record them. These samples ranged from answering questions about mindfulness to brainstorming travel plans, and even engaging in conversations about a user's day.


We selected five final voices and discussed our vision for human-AI interactions and the goals of Voice Mode with the actors​

Through May 2023, the casting team independently reviewed and hand-selected an initial list of 14 actors. They further refined their list before presenting their top voices for the project to OpenAI.

We spoke with each actor about the vision for human-AI voice interactions and OpenAI, and discussed the technology’s capabilities, limitations, and the risks involved, as well as the safeguards we have implemented. It was important to us that each actor understood the scope and intentions of Voice Mode before committing to the project.

An internal team at OpenAI reviewed the voices from a product and research perspective, and after careful consideration, the voices for Breeze, Cove, Ember, Juniper and Sky were finally selected.


Each actor flew to San Francisco for recording sessions and their voices were launched into ChatGPT in September 2023​

During June and July, we flew the actors to San Francisco for recording sessions and in-person meetings with the OpenAI product and research teams.

On September 25, 2023, we launched their voices into ChatGPT.

This entire process involved extensive coordination with the actors and the casting team, taking place over five months. We are continuing to collaborate with the actors, who have contributed additional work for audio research and new voice capabilities in GPT-4o.


New Voice Mode coming to GPT-4o for paid users, and adding new voices​

We plan to give access to a new Voice Mode for GPT-4o(opens in a new window) in alpha to ChatGPT Plus users in the coming weeks. With GPT-4o, using your voice to interact with ChatGPT is much more natural. GPT-4o handles interruptions smoothly, manages group conversations effectively, filters out background noise, and adapts to tone.

Looking ahead, you can expect even more options as we plan to introduce additional voices in ChatGPT to better match the diverse interests and preferences of users.
 
Top