The A.I Megathread (LLM , GPT , Development)

bnew · Aug 26, 2024

1/2
made a lil app to teach my kid his first 100 words

took ~1 min in @create_xyz

my favorite instruction?

"Slow the rate at which words can be scrolled so that someone doesn't have a heart attack if they scroll too fast"

that compiled to the right code for no heart attacks

2/2
Reading Tutorial for Babies

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
the fastest way to build a site

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Create - free-to-use AI app builder

Create is your AI agent for turning ideas into apps. Build sites, apps, tools, products and more just by describing what you want

www.create.xyz

bnew · Aug 27, 2024

Cerebras Systems throws down gauntlet to Nvidia with launch of 'world's fastest' AI inference service - SiliconANGLE

siliconangle.com

UPDATED 12:00 EDT / AUGUST 27 2024

AI

Cerebras Systems throws down gauntlet to Nvidia with launch of ‘world’s fastest’ AI inference service

by Mike Wheatley

Ambitious artificial intelligence computing startup Cerebras Systems Inc. is raising the stakes in its battle against Nvidia Corp., launching what it says is the world’s fastest AI inference service, and it’s available now in the cloud.

AI inference refers to the process of running live data through a trained AI model to make a prediction or solve a task. Inference services are the workhorse of the AI industry, and according to Cerebras, it’s the fastest-growing segment too, accounting for about 40% of all AI workloads in the cloud today.

However, existing AI inference services don’t appear to satisfy the needs of every customer. “We’re seeing all sorts of interest in how to get inference done faster and for less money,” Chief Executive Andrew Feldman told a gathering of reporters in San Francisco Monday.

The company intends to deliver on this with its new “high-speed inference” services. It believes the launch is a watershed moment for the AI industry, saying that 1,000-tokens-per-second speeds it can deliver is comparable to the introduction of broadband internet, enabling game-changing new opportunities for AI applications.

Raw power

Cerebras is well-equipped to offer such a service. The company is a producer of specialized and powerful computer chips for AI and high-performance computing or HPC workloads. It has made a number of headlines over the past year, claiming that its chips are not only more powerful than Nvidia’s graphics processing units, but also more cost-effective. “This is GPU-impossible performance,” declared co-founder and Chief Technology Officer Sean Lie.

Its flagship product is the new WSE-3 processor (pictured), which was announced in March and builds upon its earlier WSE-2 chipset that debuted in 2021. It’s built on an advanced five-nanometer process and features 1.4 trillion transistors more than its predecessor chip, with more than 900,000 compute cores and 44 gigabytes of onboard static random-access memory. According to the startup, the WSE-3 has 52 times more cores than a single Nvidia H100 graphics processing unit.

The chip is available as part of a data center appliance called the CS-3, which is about the same size as a small refrigerator. The chip itself is about the same size as a pizza, and comes with integrated cooling and power delivery modules. In terms of performance, the Cerebras WSE-3 is said to be twice as powerful as the WSE-2, capable of hitting a peak speed of 125 petaflops, with 1 petaflop equal to 1,000 trillion computations per second.

The Cerebras CS-3 system is the engine that powers the new Cerebras Inference service, and it notably features 7,000 times greater memory than the Nvidia H100 GPU to solve one of generative AI’s fundamental technical challenges: the need for more memory bandwidth.

Impressive speeds at lower cost

It solves that challenge in style. The Cerebras Inference service is said to be lightning quick, up to 20 times faster than comparable cloud-based inference services that use Nvidia’s most powerful GPUs. According to Cerebras, it delivers 1,800 tokens per second for the open-source Llama 3.1 8B model, and 450 tokens per second for Llama 3.1 70B.

It’s competitively priced too, with the startup saying that the service starts at just 10 cents per million tokens – equating to 100 times higher price-performance for AI inference workloads.

The company adds the Cerebras Inference service is especially well-suited for “agentic AI” workloads, or AI agents that can perform tasks on behalf of users, as such applications need the ability to constantly prompt their underlying models

Micah Hill-Smith, co-founder and chief executive of the independent AI model analysis company Artificial Analysis Inc., said his team has verified that Llama 3.1 8B and 70B running on Cerebras Inference achieves “quality evaluation results” in line with native 16-bit precision per Meta’s official versions.

“With speeds that push the performance frontier and competitive pricing, Cerebras Inference is particularly compelling for developers of AI applications with real-time or high-volume requirements,” he said.

Tiered access

Customers can choose to access the Cerebras Inference service three available tiers, including a free offering that provides application programming interface-based access and generous usage limits for anyone who wants to experiment with the platform.

The Developer Tier is for flexible, serverless deployments. It’s accessed via an API endpoint that the company says costs a fraction of the price of alternative services available today. For instance, Llama 3.1 8B is priced at just 10 cents per million tokens, while Llama 3.1 70B costs 60 cents. Support for additional models is on the way, the company said.

There’s also an Enterprise Tier, which offers fine-tuned models and custom service level agreements with dedicated support. That’s for sustained workloads, and it can be accessed via a Cerebras-managed private cloud or else implemented on-premises. Cerebras isn’t revealing the cost of this particular tier but says pricing is available on request.

Cerebras claims an impressive list of early-access customers, including organizations such as GlaxoSmithKline Plc., the AI search engine startup Perplexity AI Inc. and the networking analytics software provider Meter Inc.

Dr. Andrew Ng, founder of DeepLearning AI Inc., another early adopter, explained that his company has developed multiple agentic AI workflows that require prompting an LLM repeatedly to obtain a result. “Cerebras has built an impressively fast inference capability that will be very helpful for such workloads,” he said.

Cerebras’ ambitions don’t end there. Feldman said the company is “engaged with multiple hyperscalers” about offering its capabilities on their cloud services. “We aspire to have them as customers,” he said, as well as AI specialty providers such as CoreWeave Inc. and Lambda Inc.

Besides the inference service, Cerebras also announced a number of strategic partnerships to provide its customers with access to all of the specialized tools required to accelerate AI development. Its partners include the likes of LangChain, LlamaIndex, Docker Inc., Weights & Biases Inc. and AgentOps Inc.

Cerebras said its Inference API is fully compatible with OpenAI’s Chat Completions API, which means existing applications can be migrated to its platform with just a few lines of code.

With reporting by Robert Hof

Morethan1 · Aug 28, 2024

Better but still aways to go

bnew · Aug 28, 2024

The comic book industry wants a 100% ban on AI, too late, folks are already using to create short clips, soon whole comic books and tv-show/movies will be possible. Credit:Eric Solorio

Video - 9aca17ec

Video was uploaded to Streamin.me on Aug 28, 2024.

streamin.one

bnew · Aug 28, 2024

GameNGen

Diffusion Models Are Real-Time Game Engines

gamengen.github.io

Diffusion Models Are Real-Time Game Engines

Dani Valevski* Google Research | Yaniv Leviathan* Google Research | Moab Arar*† Tel Aviv University | Shlomi Fruchter* Google DeepMind | *Equal Contribution †Work done while at Google Research

https://arxiv.org/pdf/2408.14837

Diffusion Models Are Real-Time Game Engines

We present GameNGen, the first game engine powered entirely by a neural model that also enables real-time interaction with a complex environment over long trajectories at high quality. When trained on the classic game DOOM, GameNGen extracts gameplay and uses it to generate a playable...

arxiv.org

Real-time recordings of people playing the game DOOM simulated entirely by the GameNGen neural model.

Abstract

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

Full Gameplay Videos

https://gamengen.github.io/static/videos/e1m1_t.mp4

https://gamengen.github.io/static/videos/e1m3.mp4

https://gamengen.github.io/static/videos/e1m1_t.mp4

https://gamengen.github.io/static/videos/e1m5.mp4

https://gamengen.github.io/static/videos/e1m9.mp4

https://gamengen.github.io/static/videos/e2m2.mp4

Architecture

Data Collection via Agent Play: Since we cannot collect human gameplay at scale, as a first stage we train an automatic RL-agent to play the game, persisting it's training episodes of actions and observations, which become the training data for our generative model.

Training the Generative Diffusion Model: We re-purpose a small diffusion model, Stable Diffusion v1.4, and condition it on a sequence of previous actions and observations (frames). To mitigate auto-regressive drift during inference, we corrupt context frames by adding Gaussian noise to encoded frames during training. This allows the network to correct information sampled in previous frames, and we found it to be critical for preserving visual stability over long time periods.

Latent Decoder Fine-Tuning: The pre-trained auto-encoder of Stable Diffusion v1.4, which compresses 8x8 pixel patches into 4 latent channels, results in meaningful artifacts when predicting game frames, which affect small details and particularly the bottom bar HUD. To leverage the pre-trained knowledge while improving image quality, we train just the decoder of the latent auto-encoder using an MSE loss computed against the target frame pixels.

BibTeX

@misc{valevski2024diffusionmodelsrealtimegame,
title={Diffusion Models Are Real-Time Game Engines},
author={Dani Valevski and Yaniv Leviathan and Moab Arar and Shlomi Fruchter},
year={2024},
eprint={2408.14837},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={[2408.14837] Diffusion Models Are Real-Time Game Engines},
}

Acknowledgements

We'd like to extend a huge thank you to Eyal Segalis, Eyal Molad, Matan Kalman, Nataniel Ruiz, Amir Hertz, Matan Cohen, Yossi Matias, Yael Pritch, Danny Lumen, Valerie Nygaard, the Theta Labs and Google Research teams, and our families for insightful feedback, ideas, suggestions, and support.

bnew · Aug 28, 2024

Deepfake livestreams are here

1/1
I will be back

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

1/1
Today, we are rolling out three experimental models:

- A new smaller variant, Gemini 1.5 Flash-8B
- A stronger Gemini 1.5 Pro model (better on coding & complex prompts)
- A significantly improved Gemini 1.5 Flash model

Try them on Google AI Studio | Gemini API | Google for Developers | Google AI for Developers, details in

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Google drops ‘stronger’ and ‘significantly improved’ experimental Gemini models

The Gemini models show "huge gains," with 1.5 Flash across the board and a 1.5 Pro that is much better at math, coding and complex prompts.

venturebeat.com

Google drops ‘stronger’ and ‘significantly improved’ experimental Gemini models

Taryn Plumb@taryn_plumb

August 27, 2024 6:25 PM

VentureBeat/Ideogram

Google is continuing its aggressive Gemini updates as it races towards its 2.0 model.

The company today announced a smaller variant of Gemini 1.5, Gemini 1.5 Flash-8B, alongside a “significantly improved” Gemini 1.5 Flash and a “stronger” Gemini 1.5 Pro. These show increased performance against many internal benchmarks, the company says, with “huge gains” with 1.5 Flash across the board and a 1.5 Pro that is much better at math, coding and complex prompts.

Today, we are rolling out three experimental models:

– A new smaller variant, Gemini 1.5 Flash-8B
– A stronger Gemini 1.5 Pro model (better on coding & complex prompts)
– A significantly improved Gemini 1.5 Flash model

Try them on Google AI Studio | Gemini API | Google for Developers | Google AI for Developers, details in ?

— Logan Kilpatrick (@OfficialLoganK) August 27, 2024

“Gemini 1.5 Flash is the best… in the world for developers right now,” Logan Kilpatrick, product lead for Google AI Studio, boasted in a post on X.

‘Newest experimental iteration’ of ‘unprecedented’ Gemini models

Google introduced Gemini 1.5 Flash — the lightweight version of Gemini 1.5 — in May. The Gemini 1.5 family of models was built to handle long contexts and can reason over fine-grained information from 10M and more tokens. This allows the models to process high-volume multimodal inputs including documents, video and audio.

Today, Google is making available an “improved version” of a smaller 8 billion parameter variant of Gemini 1.5 Flash. Meanwhile, the new Gemini 1.5 Pro shows performance gains on coding and complex prompts and serves as a “drop-in replacement” to its previous model released in early August.

Kilpatrick was light on additional details, saying that Google will make a future version available for production use in the coming weeks that “hopefully will come with evals!”

He explained in an X thread that the experimental models are a means to gather feedback and get the latest, ongoing updates into the hands of developers as quickly as possible. “What we learn from experimental launches informs how we release models more widely,” he posted.

The “newest experimental iteration” of both Gemini 1.5 Flash and Pro feature 1 million token limits and are available to test for free via Google AI Studio and Gemini API, and also soon through the Vertex AI experimental endpoint. There is a free tier for both and the company will make available a future version for production use in coming weeks, according to Kilpatrick.

Beginning Sept. 3, Google will automatically reroute requests to the new model and will remove the older model from Google AI Studio and the API to “avoid confusion with keeping too many versions live at the same time,” said Kilpatrick.

“We are excited to see what you think and to hear how this model might unlock even more new multimodal use cases,” he posted on X.

Google DeepMind researchers call Gemini 1.5’s scale “unprecedented” among contemporary LLMs.

“We have been blown away by the excitement for our initial experimental model we released earlier this month,” Kilpatrick posted on X. “There has been lots of hard work behind the scenes at Google to bring these models to the world, we can’t wait to see what you build!”

‘Solid improvements,’ still suffers from ‘lazy coding disease’

Just a few hours after the release today, the Large Model Systems Organization (LMSO) posted a leaderboard update to its chatbot arena based on 20,000 community votes. Gemini 1.5-Flash made a “huge leap,” climbing from 23rd to sixth place, matching Llama levels and outperforming Google’s Gemma open models.

Gemini 1.5-Pro also showed “strong gains” in coding and math and “improve[d] significantly.”

The LMSO lauded the models, posting: “Big congrats to Google DeepMind Gemini team on the incredible launch!”

Chatbot Arena update!

The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes!

Highlights:
– New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall!
– New Gemini-1.5-Pro (0827) shows strong gains in coding, math over… x.com pic.twitter.com/D3XpU0Xiw2

— lmsys.org (@lmsysorg) August 27, 2024

As per usual with iterative model releases, early feedback has been all over the place — from sycophantic praise to mockery and confusion.

Some X users questioned why so many back-to-back updates versus a 2.0 version. One posted: “Dude this isn’t going to cut it anymore :| we need Gemini 2.0, a real upgrade.”

On the other hand, many self-described fanboys lauded the fast upgrades and quick shipping, reporting “solid improvements” in image analysis. “The speed is fire,” one posted, and another pointed out that Google continues to ship while OpenAI has effectively been quiet. One went so far as to say that “the Google team is silently, diligently and constantly delivering.”

Some critics, though, call it “terrible,” and “lazy” with tasks requiring longer outputs, saying Google is “far behind” Claude, OpenAI and Anthropic.

The update “sadly suffers from the lazy coding disease” similar to GPT-4 Turbo, one X user lamented.

Another called the updated version “definitely not that good” and said it “often goes crazy and starts repeating stuff non-stop like small models tend to do.” Another agreed that they were excited to try it but that Gemini has “been by far the worst at coding.”

Some also poked fun at Google’s uninspired naming capabilities and called back to its huge woke blunder earlier this year.

“You guys have completely lost the ability to name things,” one user joked, and another agreed, “You guys seriously need someone to help you with nomenclature.”

And, one dryly asked: “Does Gemini 1.5 still hate white people?”

bnew · Aug 28, 2024

Anthropic releases AI model system prompts, winning praise for transparency

Anthropic's recent release of system prompts for its Claude family of AI models might be a path for other AI companies to follow.

venturebeat.com

Anthropic releases AI model system prompts, winning praise for transparency

Emilia David@miyadavid

August 27, 2024 11:01 AM

A scientist stands on stage holding a red cloth above four robotic head busts

Credit: VentureBeat made with ChatGPT

The OpenAI rival startup Anthropic yesterday released system prompts for its Claude family of AI models and committed to doing so going forward, setting what appears to be a new standard of transparency for the fast-moving gen AI industry, according to observers.

System prompts act much like the operating instructions of large language models (LLMs), telling models the general rules they should follow when interacting with users and the behaviors or personalities they should exhibit They also tend to show the cut-off date for the information learned by the LLM during training.

Most LLMs have system prompts, but not every AI company publicly releases them. Uncovering the system prompts for models has even become a hobby of sorts for AI jailbreakers.

But now, Anthropic has beat the jailbreakers at their own game, going ahead and revealing the operating instructions for its models Claude 3.5 Sonnet, Claude 3 Haiku and Claude 3 Opus on its website under the release notes section.

In addition, Anthropic’s Head of Developer Relations Alex Albert posted on X (formerly Twitter) a commitment to keeping the public updated on its system prompts, writing: “We’re going to log changes we make to the default system prompts on Claude dot ai and our mobile apps.”

We've added a new system prompts release notes section to our docs. We're going to log changes we make to the default system prompts on Claude dot ai and our mobile apps. (The system prompt does not affect the API.) pic.twitter.com/9mBwv2SgB1

— Alex Albert (@alexalbert__) August 26, 2024

What Anthropic’s system prompts reveal

The system prompts for the three models — Claude 3.5 Sonnet, Claude 3 Haiku and Claude 3 Opus — reveal some interesting details about each of them, their capabilities and knowledge date cut-offs, and various personality quirks.

Claude 3.5 Sonnet is the most advanced version, with a knowledge base updated as of April 2024. It provides detailed responses to complex questions and concise answers to simpler tasks, emphasizing both accuracy and brevity. This model handles controversial topics with care, presenting information without explicitly labeling it as sensitive or claiming objectivity. Additionally, Claude 3.5 Sonnet avoids unnecessary filler phrases or apologies and is particularly mindful of how it handles image recognition, ensuring it never acknowledges recognizing any faces.

Claude 3 Opus operates with a knowledge base updated as of August 2023 and excels at handling complex tasks and writing. It is designed to give concise responses to simple queries and thorough answers to more complex questions. Claude 3 Opus addresses controversial topics by offering a broad range of perspectives, avoiding stereotyping, and providing balanced views. While it shares some similarities with the Sonnet model, it does not incorporate the same detailed behavioral guidelines, such as avoiding apologies or unnecessary affirmations.

Claude 3 Haiku is the fastest model in the Claude family, also updated as of August 2023. It is optimized for delivering quick, concise responses to simple questions while still providing thorough answers when needed for more complex issues. The prompt structure for Haiku is more straightforward compared to Sonnet, focusing primarily on speed and efficiency, without the more advanced behavioral nuances found in the Sonnet model.

Why Anthropic’s release of its system prompts is important

A common complaint about generative AI systems revolves around the concept of a “black box,” where it’s difficult to find out why and how a model came to a decision. The black box problem has led to research around AI explainability, a way to shed some light on the predictive decision-making process of models. Public access to system prompts is a step towards opening up that black box a bit, but only to the extent that people understand the rules set by AI companies for models they’ve created.

AI developers celebrated Anthropic’s decision, noting that releasing documents on Claude’s system prompts and updates to it stands out among other AI companies.

Anthropic Claude now tracks system prompt changes in their docs!

This is SO nice and much more transparent than ChatGPT!! https://t.co/m25OPZvNJF

— Nick Dobos (@NickADobos) August 26, 2024

We can now see the system prompts for all three versions of Claude – and when they were last updated – in their entirety. This is a great change, and I hope this is eventually adopted industry wide. Good stuff from Anthropic. Transparency! x.com pic.twitter.com/2E4zP4LsVz

— Andrew Curran (@AndrewCurran_) August 26, 2024

Great move by Anthropic to share their system prompt releases with users!System Prompts - Anthropic pic.twitter.com/thGbWTIwRZ

— Victor M (@victormustar) August 26, 2024

Not fully open source, though

Releasing system prompts for the Claude models does not mean Anthropic opened up the model family. The actual source code for running the models, as well as the training data set and underlying “weights” (or model settings), remain in Anthropic’s hands alone.

Still, Anthropic’s release of the Claude system prompts shows other AI companies a path to greater transparency in AI model development. And it benefits users by showing them just how their AI chatbot is designed to act.

1/1
We've added a new system prompts release notes section to our docs. We're going to log changes we make to the default system prompts on Claude dot ai and our mobile apps. (The system prompt does not affect the API.)

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Great move by Anthropic to share their system prompt releases with users!
System Prompts - Anthropic

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 28, 2024

‘This could change everything!’ Nous Research unveils new tool to train powerful AI models with 10,000x efficiency

Ultimately, the DisTrO method could open the door to many more people being able to train massively powerful AI models.

venturebeat.com

‘This could change everything!’ Nous Research unveils new tool to train powerful AI models with 10,000x efficiency

Carl Franzen@carlfranzen

August 27, 2024 10:22 AM

Frizzy dark haired three eyed glowing eyes cyborg with giant hands overlooks office workers at desks with PCs and globe

Credit: VentureBeat made with ChatGPT

Nous Research turned heads earlier this month with the release of its permissive, open-source Llama 3.1 variant Hermes 3.

Now, the small research team dedicated to making “personalized, unrestricted AI” models has announced another seemingly massive breakthrough: DisTrO (Distributed Training Over-the-Internet), a new optimizer that reduces the amount of information that must be sent between various GPUs (graphics processing units) during each step of training an AI model.

Nous’s DisTrO optimizer means powerful AI models can now be trained outside of big companies, across the open web on consumer-grade connections, potentially by individuals or institutions working together from around the world.

DisTrO has already been tested and shown in a Nous Research technical paper to yield an 857 times efficiency increase compared to one popular existing training algorithm, All-Reduce, as well as a massive reduction in the amount of information transmitted during each step of the training process (86.8 megabytes compared to 74.4 gigabytes) while only suffering a slight loss in overall performance. See the results in the table below from the Nous Research technical paper:

Screenshot-2024-08-27-at-12.41.29%E2%80%AFPM.png

Ultimately, the DisTrO method could open the door to many more people being able to train massively powerful AI models as they see fit.

As the firm wrote in a post on X yesterday: “Without relying on a single company to manage and control the training process, researchers and institutions can have more freedom to collaborate and experiment with new techniques, algorithms, and models. This increased competition fosters innovation, drives progress, and ultimately benefits society as a whole.”

What if you could use all the computing power in the world to train a shared, open source AI model?

Preliminary report: DisTrO/A_Preliminary_Report_on_DisTrO.pdf at main · NousResearch/DisTrO

Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of… pic.twitter.com/h2gQJ4m7lB

— Nous Research (@NousResearch) August 26, 2024

'

The problem with AI training: steep hardware requirements

As covered on VentureBeat previously, Nvidia’s GPUs in particular are in high demand in the generative AI era, as the expensive graphics cards’ powerful parallel processing capabilities are needed to train AI models efficiently and (relatively) quickly. This blog post at APNic describes the process well.

A big part of the AI training process relies on GPU clusters — multiple GPUs — exchanging information with one another about the model and the information “learned” within training data sets.

However, this “inter-GPU communication” requires that GPU clusters be architected, or set up, in a precise way in controlled conditions, minimizing latency and maximizing throughput. Hence why companies such as Elon Musk’s Tesla are investing heavily in setting up physical “superclusters” with many thousands (or hundreds of thousands) of GPUs sitting physically side-by-side in the same location — typically a massive airplane hangar-sized warehouse or facility.

Because of these requirements, training generative AI — especially the largest and most powerful models — is typically an extremely capital-heavy endeavor, one that only some of the most well-funded companies can engage in, such as Tesla, Meta, OpenAI, Microsoft, Google, and Anthropic.

The training process for each of these companies looks a little different, of course. But they all follow the same basic steps and use the same basic hardware components. Each of these companies tightly controls its own AI model training processes, and it can be difficult for incumbents, much less laypeople outside of them, to even think of competing by training their own similarly-sized (in terms of parameters, or the settings under the hood) models.

But Nous Research, whose whole approach is essentially the opposite — making the most powerful and capable AI it can on the cheap, openly, freely, for anyone to use and customize as they see fit without many guardrails — has found an alternative.

What DisTrO does differently

While traditional methods of AI training require synchronizing full gradients across all GPUs and rely on extremely high bandwidth connections, DisTrO reduces this communication overhead by four to five orders of magnitude.

The paper authors haven’t fully revealed how their algorithms reduce the amount of information at each step of training while retaining overall model performance, but plan to release more on this soon.

The reduction was achieved without relying on amortized analysis or compromising the convergence rate of the training, allowing large-scale models to be trained over much slower internet connections — 100Mbps download and 10Mbps upload, speeds available to many consumers around the world.

The authors tested DisTrO using the Meta Llama 2, 1.2 billion large language model (LLM) architecture and achieved comparable training performance to conventional methods with significantly less communication overhead.

They note that this is the smallest-size model that worked well with the DisTrO method, and they “do not yet know whether the ratio of bandwidth reduction scales up, down, or stays constant as model size increases.”

Yet, the authors also say that “our preliminary tests indicate that it is possible to get a bandwidth requirements reduction of up to 1000x to 3000x during the pre-training,” phase of LLMs, and “for post-training and fine-tuning, we can achieve up to 10000x without any noticeable degradation in loss.”

They further hypothesize that the research, while initially conducted on LLMs, could be used to train large diffusion models (LDMs) as well: think the Stable Diffusion open source image generation model and popular image generation services derived from it such as Midjourney.

Still need good GPUs

To be clear: DisTrO still relies on GPUs — only instead of clustering them all together in the same location, now they can be spread out across the world and communicate over the consumer internet.

Specifically, DisTrO was evaluated using 32x H100 GPUs, operating under the Distributed Data Parallelism (DDP) strategy, where each GPU had the entire model loaded in VRAM.

This setup allowed the team to rigorously test DisTrO’s capabilities and demonstrate that it can match the convergence rates of AdamW+All-Reduce despite drastically reduced communication requirements.

This result suggests that DisTrO can potentially replace existing training methods without sacrificing model quality, offering a scalable and efficient solution for large-scale distributed training.

By reducing the need for high-speed interconnects DisTrO could enable collaborative model training across decentralized networks, even with participants using consumer-grade internet connections.

The report also explores the implications of DisTrO for various applications, including federated learning and decentralized training.

Additionally, DisTrO’s efficiency could help mitigate the environmental impact of AI training by optimizing the use of existing infrastructure and reducing the need for massive data centers.

Moreover, the breakthroughs could lead to a shift in how large-scale models are trained, moving away from centralized, resource-intensive data centers towards more distributed, collaborative approaches that leverage diverse and geographically dispersed computing resources.

What’s next for the Nous Research team and DisTrO?

The research team invites others to join them in exploring the potential of DisTrO. The preliminary report and supporting materials are available on GitHub, and the team is actively seeking collaborators to help refine and expand this groundbreaking technology.

Already, some AI influencers such as @kimmonismus on X (aka chubby) have praised the research as a huge breakthrough in the field, writing, “This could change everything!”

Wow, amazing! This could change everything! x.com

— Chubby (@kimmonismus) August 27, 2024

With DisTrO, Nous Research is not only advancing the technical capabilities of AI training but also promoting a more inclusive and resilient research ecosystem that has the potential to unlock unprecedented advancements in AI.

A_Preliminary_Report_on_DisTrODownload

bnew · Aug 28, 2024

Amazon to launch AI-enhanced Alexa subscription in October

Amazon plans to release a paid version of its Alexa voice assistant with advanced AI features this October. The upgraded Alexa aims to compete with newer AI assistants from companies like OpenAI and Google.

the-decoder.com

AI in practice

Aug 27, 2024

Amazon to launch AI-enhanced Alexa subscription in October

Midjourney prompted by THE DECODER

Kim M. Scheurenbrand

Kim is a regular contributor to THE DECODER. He focuses on the ethical, economic, and political implications of AI.
Profile

Amazon plans to release a paid version of its Alexa voice assistant with advanced AI features this October. The upgraded Alexa aims to compete with newer AI assistants from companies like OpenAI and Google.

Internal documents obtained by The Washington Post reveal that the new Alexa, known internally as "Remarkable Alexa" or "Project Banyan," will offer several AI-powered capabilities.

A key feature is "Smart Briefing," which will provide personalized daily news summaries generated by AI. This feature is being developed despite concerns about AI's accuracy in handling political news, especially with the upcoming U.S. presidential election.

The subscription could cost up to $10 per month, though the current "classic Alexa" will remain free. Amazon executives are expected to finalize pricing, subscription structure, and product name this month.

We know your family and they should eat more vegetables

The improved Alexa is reportedly designed to be more conversational and engaging. It will learn to recognize individual voices and ask users about their preferences to provide more tailored assistance. Other new features include improved recipe recommendations and AI-powered shopping tools.

Amazon is also developing a web-based product called Project Metis, intended to compete directly with ChatGPT-style LLM-tools. This move comes as Amazon faces pressure to keep pace with AI advancements from competitors.

The company has invested $4 billion in AI startup Anthropic but is also developing its own large language model, Olympus. Amazon aims for Olympus to surpass Anthropic's Claude model, with early reports suggesting it has "hundreds of billions of parameters." But we haven't heard from Olympus lately.

The launch of the new Alexa has been delayed, with internal documents initially targeting a September 2024 release. The current mid-October timeline indicates it has taken over a year to bring the project to market since its announcement in September 2023.

While Amazon hasn't publicly disclosed Alexa's financial performance, reports suggest the company's devices business, which includes Alexa, has been losing money. The subscription model and enhanced e-commerce features of the new Alexa could help Amazon recoup some of its investment.

bnew · Aug 28, 2024

OpenAI has a "highly accurate" ChatGPT text detector, but won't release it for now

OpenAI has developed technology to reliably detect AI-generated text, according to inside sources and documents reported by the Wall Street Journal. However, the company is reluctant to release it, likely due to concerns about its own business model.

the-decoder.com

AI in practice

Aug 5, 2024

Update

OpenAI has a "highly accurate" ChatGPT text detector, but won't release it for now

Midjourney prompted by THE DECODER

Matthias Bastian https://twitter.com/maba_xr

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Profile
E-Mail

Update

Added OpenAI's statement.

Update from August 5, 2024:

Following the Wall Street Journal's coverage, OpenAI revised an earlier blog post on AI content detection, confirming the existence of their watermarking detector.

The detector excels at detecting minor text changes such as paraphrasing, but struggles with major changes such as translations, rewrites using different AI models, or the insertion and removal of special characters between words.

Ultimately, this makes bypassing the detector "trivial," according to OpenAI. The company also mentions concerns that it could unfairly target certain groups, such as non-native English speakers who use ChatGPT to improve their writing.

While the watermarking method has a low false positive rate for individual texts, applying it to large volumes of content would still lead to a significant number of misidentifications overall.

OpenAI is researching metadata as an alternative method of verifying the provenance of text. This research is in the "early stages of exploration," and its effectiveness remains to be seen. Metadata is promising because, unlike watermarks, it can be cryptographically signed, eliminating false positives, according to OpenAI.

OpenAI says it is focusing on audiovisual content, which it considers higher risk. It's updated DALL-E 3 image provenance C2PA-based system now tracks if and how AI-generated images are edited after generation.

Bild: OpenAI

Share

Recommend our article

Original article from August 4, 2024:

OpenAI has developed technology to reliably detect AI-generated text, according to inside sources and documents reported by the Wall Street Journal. However, the company is reluctant to release it, likely due to concerns about its own business model.

bnew · Aug 28, 2024

Former OpenAI researcher believes company is "fairly close" to AGI and not prepared for it

About half of OpenAI's AGI/ASI safety researchers have left the company recently, according to a former employee. The departures likely stem from disagreements over managing the risks of potential superintelligent AI.

the-decoder.com

AI in practice

Aug 27, 2024

Former OpenAI researcher believes company is "fairly close" to AGI and not prepared for it

Midjourney prompted by THE DECODER

Matthias Bastian

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile
E-Mail

About half of OpenAI's AGI/ASI safety researchers have left the company recently, according to a former employee. The departures likely stem from disagreements over managing the risks of potential superintelligent AI.

Daniel Kokotajlo, a former OpenAI safety researcher, told Fortune magazine that around half of the company's safety researchers have departed, including prominent leaders.

While Kokotajlo didn't comment on specific reasons for all the resignations, he believes they align with his own views: OpenAI is "fairly close" to developing artificial general intelligence (AGI) but isn't prepared to "handle all that entails."

This has led to a "chilling effect" on those trying to publish research on AGI risks within the company, Kokotajlo said. He also noted an "increasing amount of influence by the communications and lobbying wings of OpenAI" on what's deemed appropriate to publish.

The temporary firing of OpenAI CEO Sam Altman was also linked to safety concerns. A law firm cleared Altman after his reinstatement.

Of about 30 employees working on AGI safety issues, around 16 remain. Kokotajlo said these departures weren't a "coordinated thing" but rather people "individually giving up."

Notable departures include Jan Hendrik Kirchner, Collin Burns, Jeffrey Wu, Jonathan Uesato, Steven Bills, Yuri Burda, Todor Markov, and OpenAI co-founder John Schulman.

The resignations of chief scientist Ilya Sutskever and Jan Leike, who jointly led the company's "superalignment" team focused on future AI system safety, were particularly significant. OpenAI subsequently disbanded this team.

Experts leave OpenAI, but not AGI

Kokotajlo expressed disappointment, but not surprise, that OpenAI opposed California's SB 1047 bill, which aims to regulate advanced AI system risks. He co-signed a letter to Governor Newsom criticizing OpenAI's stance, calling it a betrayal of the company's original plans to thoroughly assess AGI's long-term risks for developing regulations and laws.

bnew · Aug 28, 2024

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model

OpenAI is developing two new AI models that could significantly advance the field. "Strawberry" aims to solve complex math and programming problems better than current systems, while "Orion" aims to surpass GPT-4's capabilities with the help of Strawberry.

the-decoder.com

AI in practice

Aug 27, 2024

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model

Midjourney prompted by THE DECODER

Matthias Bastian
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Profile
E-Mail

OpenAI is developing two new AI models that could significantly advance the field. "Strawberry" aims to solve complex math and programming problems better than current systems, while "Orion" aims to surpass GPT-4's capabilities with the help of Strawberry.

According to The Information, citing two people involved in the project, OpenAI might release a chatbot version of Strawberry as early as this fall, possibly as part of ChatGPT.

Strawberry is designed to tackle previously unseen math problems and optimize programming tasks. Its enhanced logic should allow it to solve language-related challenges more effectively when given sufficient time to "think."

Agent-based AI systems based on Strawberry

In internal demonstrations, Strawberry reportedly solved the New York Times word puzzle "Connections." The model could also serve as a foundation for more advanced AI systems capable of not just generating content, but taking action.

Reuters reported that OpenAI has already tested an AI internally that scored over 90 percent on the MATH benchmark, a collection of math mastery tasks. This is likely Strawberry, which has also been presented to national security officials, according to The Information.

Internal OpenAI documents describe plans to use Strawberry models for autonomous internet searches, enabling the AI to plan ahead and conduct in-depth research.

The Information notes that it's uncertain whether Strawberry will launch this year. If released, it would be a distilled version of the original model, delivering similar performance with less computational power – a technique OpenAI has also used for GPT-4 variants since the original model was released in March 2023.

OpenAI's approach reportedly resembles the "Self-Taught Reasoner" (STaR) method introduced by Stanford researchers, which aims to improve AI systems' reasoning abilities.

Former OpenAI chief researcher Ilya Sutskever, who has since founded his own startup focused on secure super AI, is said to have provided the idea and basis for Strawberry.

bnew · Aug 28, 2024

German AI startup Aleph Alpha unveils new AI stack "Pharia AI" and new language models

German AI specialist Aleph Alpha has introduced Pharia AI, a comprehensive software stack designed to help enterprises and government agencies develop and operate AI applications with confidence and future proofing.

the-decoder.com

AI in practice

Aug 26, 2024

German AI startup Aleph Alpha unveils new AI stack "Pharia AI" and new language models

Aleph Alpha

Maximilian Schreiner
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail

German AI specialist Aleph Alpha has introduced Pharia AI, a comprehensive software stack designed to help enterprises and government agencies develop and operate AI applications with confidence and future proofing.

According to founder Jonas Andrulis, the goal is to provide customers with a complete solution for developing AI applications from the initial concept to production deployment.

Pharia AI is made up of several components:
- Pharia Catch assists subject matter experts in structuring and storing their knowledge for AI development.
- Pharia Studio guides developers through the process of creating application-specific AI systems from this knowledge and pre-trained models.
- Pharia OS handles the operation and scaling of these systems, including access control and monitoring.
- The Pharia Assistant provides a simple interface for employees to utilize the AI functions.

Aleph Alpha stresses that the stack should offer customers sovereignty and future security. The systems can be flexibly operated in the cloud or on-premises and trained with their own data. Additionally, customers should always have access to the latest AI innovations, whether open source models or Aleph Alpha's own developments.

These innovations include a method that allows language models to be more efficiently adapted to new languages and specialist areas without compromising performance in the source languages. Such innovations stem from Aleph Alpha's collaboration with researchers, such as those at the Technical University of Darmstadt. The company is also working on a way for customers to refine the behavior of the language models themselves and offers Explainable AI functions. All Pharia AI features are set to roll out in the coming months, with some already being used by select customers.

Alongside the core stack, the company also provides industry-specific solutions for sectors like the public sector, banks, and insurance companies. One example is Creance AI, a joint venture with PwC that helps financial institutions automatically check contracts for regulatory requirements.

Aleph Alpha sees partners as a key success factor for the dissemination of its technology. Platinum partners like IT service provider Materna and PwC support customers in the implementation of AI projects based on Pharia AI.

New Pharia-1-LLM language models published

Coinciding with the launch of the Pharia 1 language model family, which comprises 7 billion parameters, Aleph Alpha has also published new basic models, including training code and detailed information on data and capabilities. Pharia-1-LLM-7B-control can be flexibly adapted to user preferences, while the model behavior of Pharia-1-LLM-7B-control-aligned has been optimized for dealing with sensitive topics. Both models are trained in seven languages, with special optimization for English, German, French, and Spanish. The models are tailored to short, concise answers and - according to Aleph Alpha - are on par with the latest open source language models in the 7 to 8 billion parameter range. According to Aleph Alpha, they have been fully trained in accordance with applicable EU and national laws, making them suitable for corporate use.

Chief Research Officer Yasser Jadidi states that model size is not the decisive factor, and efficiency and domain-specific optimization are more important. However, the company is not ruling out the possibility of offering larger models in the future. The Pharia-1 models are available on Hugging Face.

bnew · Aug 28, 2024

Generative AI reportedly gives Apple's upcoming robotic arm a personality beyond Siri

Apple is planning a tabletop device with a robotic arm that will have its own personality using generative AI. The project could be the company's entry into the robotics market.

the-decoder.com

AI in practice

Aug 25, 2024

Generative AI reportedly gives Apple's upcoming robotic arm a personality beyond Siri

Ideogram prompted by THE DECODER

Matthias Bastian
https://twitter.com/maba_xr
https://www.linkedin.com/in/matthias-bastian-128b71b1/
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile
E-Mail

Apple is planning a tabletop device with a robotic arm that will have its own personality using generative AI. The project could be the company's entry into the robotics market.

According to a report by Mark Gurman for Bloomberg, the company is developing a human-like interface based on generative AI that will give the robotic arm a personality.

This AI personality is expected to be different from Apple's well-known digital assistant, Siri. It could run on the tabletop product and other future Apple robotics devices. A successful robotics device could help Apple break into the smart home market.

Tabletop device as a test case

Apple is planning a tabletop device codenamed J595 that is part of a larger robotics project. The device will combine a large iPad-like display with cameras and a base with a robotic arm. The device is expected to be released around 2026 or 2027, according to Gurman.

The robotic arm is designed to solve everyday problems. For example, it could swivel the device's screen towards the user to facilitate video conferencing or browsing recipes. This would be especially useful when the user's hands are busy with other tasks.

Generative AI plays a central role in the development of the robotic arm. It will control the movements and enable interaction with the user. Apple plans to draw on its expertise in areas such as sensor technology, advanced silicon chips, and hardware engineering.

According to Gurman, Apple needs to develop hardware that can successfully navigate cluttered spaces. In addition, the technology will initially be costly, both in manufacturing and for the consumer.

The development of the robotic arm is being led by Kevin Lynch, Apple's vice president of technology. He is working with robotics teams in Hardware Engineering and has recently hired top experts from institutions such as the Technion in Israel.

Future prospects

The robotic arm could be Apple's entry into a new market. If the project is successful, the company plans to develop mobile robots and possibly even humanoid models in the future.

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Cerebras Systems throws down gauntlet to Nvidia with launch of ‘world’s fastest’ AI inference service​

Raw power​

Impressive speeds at lower cost​

Tiered access​

Veteran

Veteran

Veteran

​

Diffusion Models Are Real-Time Game Engines​

Abstract​

Full Gameplay Videos​

Architecture​

BibTeX​

Acknowledgements​

Veteran

Veteran

Google drops ‘stronger’ and ‘significantly improved’ experimental Gemini models​

‘Newest experimental iteration’ of ‘unprecedented’ Gemini models​

‘Solid improvements,’ still suffers from ‘lazy coding disease’​

Veteran

Anthropic releases AI model system prompts, winning praise for transparency​

What Anthropic’s system prompts reveal​

Why Anthropic’s release of its system prompts is important​

Not fully open source, though​

Veteran

‘This could change everything!’ Nous Research unveils new tool to train powerful AI models with 10,000x efficiency​

The problem with AI training: steep hardware requirements​

What DisTrO does differently​

Still need good GPUs​

What’s next for the Nous Research team and DisTrO?​

Veteran

​

Amazon to launch AI-enhanced Alexa subscription in October​

​

We know your family and they should eat more vegetables​

Veteran

OpenAI has a "highly accurate" ChatGPT text detector, but won't release it for now​

Veteran

Former OpenAI researcher believes company is "fairly close" to AGI and not prepared for it​

​

Experts leave OpenAI, but not AGI​

Veteran

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model​

Agent-based AI systems based on Strawberry​

Veteran

​

German AI startup Aleph Alpha unveils new AI stack "Pharia AI" and new language models​

​

New Pharia-1-LLM language models published​

Veteran

Generative AI reportedly gives Apple's upcoming robotic arm a personality beyond Siri​

Tabletop device as a test case​

Future prospects​

Cerebras Systems throws down gauntlet to Nvidia with launch of ‘world’s fastest’ AI inference service

Raw power

Impressive speeds at lower cost

Tiered access

Diffusion Models Are Real-Time Game Engines

Abstract

Full Gameplay Videos

Architecture

BibTeX

Acknowledgements

Google drops ‘stronger’ and ‘significantly improved’ experimental Gemini models

‘Newest experimental iteration’ of ‘unprecedented’ Gemini models

‘Solid improvements,’ still suffers from ‘lazy coding disease’

Anthropic releases AI model system prompts, winning praise for transparency

What Anthropic’s system prompts reveal

Why Anthropic’s release of its system prompts is important

Not fully open source, though

‘This could change everything!’ Nous Research unveils new tool to train powerful AI models with 10,000x efficiency

The problem with AI training: steep hardware requirements

What DisTrO does differently

Still need good GPUs

What’s next for the Nous Research team and DisTrO?

Amazon to launch AI-enhanced Alexa subscription in October

We know your family and they should eat more vegetables

OpenAI has a "highly accurate" ChatGPT text detector, but won't release it for now

Former OpenAI researcher believes company is "fairly close" to AGI and not prepared for it

Experts leave OpenAI, but not AGI

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model

Agent-based AI systems based on Strawberry

German AI startup Aleph Alpha unveils new AI stack "Pharia AI" and new language models

New Pharia-1-LLM language models published

Generative AI reportedly gives Apple's upcoming robotic arm a personality beyond Siri

Tabletop device as a test case

Future prospects