The A.I Megathread (LLM , GPT , Development)

bnew · Sep 28, 2024

1/6
Introducing Empathic Voice Interface 2 (EVI 2), our new voice-to-voice foundation model. EVI 2 merges language and voice into a single model trained specifically for emotional intelligence.

You can try it and start building today.

2/6
EVI 2 can converse rapidly with users with sub-second response times, understand a user’s tone of voice, generate any tone of voice, and even respond to some more niche requests like changing its speaking rate or rapping. Talk to it here: App · Hume AI

3/6
It can emulate a wide range of personalities, accents, and speaking styles and possesses emergent multilingual capabilities.

4/6
EVI 2 is adjustable. We’ve been experimenting with ways to create synthetic voices unique to any app or user, without voice cloning.

Find out more here: Introducing EVI 2, our new foundational voice-to-voice model • Hume AI

5/6
Today, EVI 2 is available in beta for anyone to use. It’s available to talk to via our app and to build into applications via our beta EVI 2 API (in keeping our guidelines). Home · Hume AI

6/6
The model can already outputs several languages (e.g., Spanish, German, French) but currently only understands English inputs. We'll be accepting other language inputs soon, along with more languages!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 28, 2024

1/10

Still relying on human-crafted rules to improve pretraining data? Time to try Programming Every Example(ProX)! Our latest efforts use LMs to refine data with unprecedented accuracy, and brings up to 20x faster training in general and math domain!

Curious about the details?

2/10
1/n

Pre-training large language models (LLMs) typically relies on static, human-crafted rules for data refinement. While useful, these rules can’t adapt to the diverse examples in the data.

3/10
2/n

Enter ProX, where we treat data refinement as a programming task! Instead of fixed rules, ProX empowers LMs to generate tailored refining programs for each sample. Even with models as small as 0.3B parameters

, ProX’s approach can surpass human-level quality greatly!

4/10
3/n

The results? Models trained on ProX-refined data show > 2% gains on 10 downstream benchmarks. Plus, models trained on 50B tokens achieve comparable performance to those trained on 3T tokens

.

5/10
4/n

ProX also SHINES in domain-specific continual pre-training! On math domain, models trained on ProX surprisingly saw 7%~20% improvement with just 10B tokens! This means ProX achieves performance on par with existing models using 20x fewer tokens, excelling its efficiency!

6/10
5/n

Analysis shows investing FLOPs in data refinement actually helps SAVING computing FLOPs. With the trained model getting larger, achieving same performance via ProX will actually save more FLOPs. In preliminary experiments, we could save about 40% for 1.7B models!

7/10
6/n

We believe ProX is a step toward making pre-training more efficient by investing more in data refinement. And we also believe that

inference time scaling

for data refining can be more and more important to the future LLM developments #

8/10
7/n,n=7

Learn more about ProX and stay tuned at:

GitHub Repo: GitHub - GAIR-NLP/ProX: Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"

Hugging Face: gair-prox (GAIR-ProX)

Paper: Paper page - Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

9/10
Finally: Thanks to all the amazing collaborators and supervisors
@SinclairWang1 @sivil_taram
@lockonlvange
@stefan_fee
!! It’s been such an enjoyable and rewarding experience working with all of you!

10/10
Thank you, Loubna! I remember asking you for tons of advice on pre-training configs.

Yes, I believe chunk-level refinement may still bring some boosts. It would also be a very interesting question, since both Edu&DCLM obtain super high-quality data via rather aggressive filtering manners(and indeed, they deliver top quality data

)

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 29, 2024

1/2
Behind the scenes creating Meta AI voice.

2/2
It’s rolling out, so you should get it in the coming days!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 30, 2024

bnew · Sep 30, 2024

1/2
Aside from TPUs running hot today, five things from Notebook HQ:

1) Thanks for all your feedback on AOs so far. I’m copy pasting everything into a Notebook so I can listen to a Deep Dive and search it later. We’re going to launch some immediate tweaks to make it less repetitive, improve the content, and so on - quality work in the background.

2) These jailbreaks are WILD – I saw on Reddit that someone’s gotten it to output French (neat), but please keep in mind that the quality for other languages is still going through evals and that’s why we haven’t released it yet. In progress though.

3) In these hacks it looks like you can get around our flags to see the prototype features like MagicDraft and custom chatbots. I’ll say two things about this: these are extremely promising and have tested well, but we need to rev on the next layer before these can be ready for launch. More below:

4) An agentic and personalized writing workflow, especially using YOUR style and formats, is another type of “transform” that I’m really excited about. Can’t tell you how often I take a pile of research and some haphazard notes to write my POV on it – this streamlines that flow massively. Gemini 1.5 is really good at this and a well-done UX is what’s needed to connect the user to this capability. Not sure if the space is too crowded or kind of tired at this point, so I just need to study a little bit more before we put this in Notebook.

5) Custom chatbots… I have a lot to say. This is pretty widely used internally at Google and literally every day someone pings me to say “This has 10x’d our team’s productivity.” Not joking. In the hacks you’re still looking at the old version so I’m excited for what you all think when the new version launches :smile:

[Quoted tweet]
BREAKING

: Google’s NotebookLM could let users build custom chatbots from notebooks.

If you already had high expectations from NotebookLM, you must raise them even higher! Here is why

Disclaimer: All mentioned features here are WIP

h/t @bedros_p

2/2
The bar for production is so high that I've made do with the 200k Googlers for now so I can focus on feature functionality - but let me see if I can launch this early access program in 2 weeks so people don't have to hack around it

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/8
BREAKING

: Google’s NotebookLM could let users build custom chatbots from notebooks.

If you already had high expectations from NotebookLM, you must raise them even higher! Here is why

Disclaimer: All mentioned features here are WIP

h/t @bedros_p

2/8
Besides the previously discovered "Magic Draft" feature, NotebookLM may get a bot creation UI where you can set a custom prompt on top of uploaded Notebook sources.

These chatbots are meant to be sharable and potentially "embedded"

This will turn them into Gems 2.0!

3/8
Notebooks may also get a model selector to switch between different Google models like Gemini Pro or Med-PaLM. However, this is likely only used for internal purposes for now.

h/t @bedros_p

4/8
Read more on @testingcatalog 🗞

Google’s NotebookLM could let users build custom chatbots

5/8
More insights on this topic. Check points 4 and 5

[Quoted tweet]
Aside from TPUs running hot today, five things from Notebook HQ:

1) Thanks for all your feedback on AOs so far. I’m copy pasting everything into a Notebook so I can listen to a Deep Dive and search it later. We’re going to launch some immediate tweaks to make it less repetitive, improve the content, and so on - quality work in the background.

2) These jailbreaks are WILD – I saw on Reddit that someone’s gotten it to output French (neat), but please keep in mind that the quality for other languages is still going through evals and that’s why we haven’t released it yet. In progress though.

3) In these hacks it looks like you can get around our flags to see the prototype features like MagicDraft and custom chatbots. I’ll say two things about this: these are extremely promising and have tested well, but we need to rev on the next layer before these can be ready for launch. More below:

4) An agentic and personalized writing workflow, especially using YOUR style and formats, is another type of “transform” that I’m really excited about. Can’t tell you how often I take a pile of research and some haphazard notes to write my POV on it – this streamlines that flow massively. Gemini 1.5 is really good at this and a well-done UX is what’s needed to connect the user to this capability. Not sure if the space is too crowded or kind of tired at this point, so I just need to study a little bit more before we put this in Notebook.

5) Custom chatbots… I have a lot to say. This is pretty widely used internally at Google and literally every day someone pings me to say “This has 10x’d our team’s productivity.” Not joking. In the hacks you’re still looking at the old version so I’m excited for what you all think when the new version launches :smile:

6/8
Yes, exactly

7/8
“Soon”

8/8
I have a flight tomorrow around this time

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 1, 2024

bnew · Oct 1, 2024

bnew · Oct 1, 2024

Earlier today, OpenAI released a new whisper model (turbo), and now it can run locally in your browser w/ Transformers.js! I was able to achieve ~10x RTF (real-time factor), transcribing 120 seconds of audio in ~12 seconds, on a M3 Max. Important links:

ONNX model: onnx-community/whisper-large-v3-turbo · Hugging Face
Source code: GitHub - xenova/whisper-web at experimental-webgpu
Demo: Whisper Large V3 Turbo WebGPU - a Hugging Face Space by webml-community

large-v3-turbo model by jongwook · Pull Request #2361 · openai/whisper

Robust Speech Recognition via Large-Scale Weak Supervision - large-v3-turbo model by jongwook · Pull Request #2361 · openai/whisper

github.com

1. What is the new Whisper Turbo model released by OpenAI, and how does it differ from the previous Whisper model?

The new Whisper Turbo model is a distilled version of the original Whisper large-v3 model. The main difference is that it has reduced the number of decoding layers from 32 to 4, making it significantly faster while introducing a minor degradation in quality. This change allows for real-time transcription with reduced computational requirements.

2. How does the Whisper Turbo model achieve its significant speed improvement, and what are the trade-offs?

The speed improvement is achieved by reducing the number of decoding layers from 32 to 4. This reduction in complexity allows for faster processing times, with reports indicating that it can transcribe 120 seconds of audio in approximately 12 seconds on an M3 Max device. However, this comes at a minor cost in terms of transcription accuracy compared to the original model.

3. How is the Whisper Turbo model deployed locally in a browser, and what technologies are used to enable this?

The Whisper Turbo model is deployed locally in a browser using Transformers.js and WebGPU technologies. The model files are downloaded and stored in the browser's cache storage during the initial run, allowing subsequent uses to load quickly without needing an internet connection. The WebGPU technology leverages GPU capabilities for faster processing.

4. What are the implications of running the Whisper Turbo model locally in a browser, especially in terms of offline capability and resource usage?

Running the Whisper Turbo model locally in a browser means that it can function offline after the initial download of the model files. This is achieved through service workers that allow the website to load even without an internet connection. However, there are considerations regarding resource usage; the model requires substantial memory (around 800MB) but only loads into memory when actively used and offloads back to disk cache when not in use.

5. What are some of the user experiences and challenges reported by users when using the new Whisper Turbo model, particularly in terms of accuracy and multilingual support?

Users have reported mixed experiences with the new model. Some have noted that while it is much faster, there is a minor degradation in transcription accuracy compared to the original model. There are also concerns about multilingual support, with some users experiencing difficulties with languages other than English. Additionally, features like speaker diarization are missing, which some users find significant. Despite these challenges, many users appreciate the speed and local deployment capabilities of the new model.

bnew · Oct 1, 2024

bnew · Oct 1, 2024

bnew · Oct 1, 2024

bnew · Oct 1, 2024

Pika 1.5 AI video generator

1/6
Sry, we forgot our password.
PIKA 1.5 IS HERE.

With more realistic movement, big screen shots, and mind-blowing Pikaffects that break the laws of physics, there’s more to love about Pika than ever before.

Try it.

2/6
🫡

3/6
@MatanCohenGrumi never not delivers

4/6
we'll always be here for you

5/6
Thank you, Alex! Means a lot coming from you.

6/6
Thank you, Tom!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2

2/2
Try to generate your AI video on Pika

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
Crush it. Melt it. Cake-ify it. Explode it. Squish it. Inflate it. Pikaffect it. PIKA 1.5 IS HERE. Try it on Pika

2/2
Sound Credit: @EclecticMethod Video Credit: @starryhe214

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 1, 2024

bnew · Oct 1, 2024

1/2
Starting this week, Advanced Voice is rolling out to all ChatGPT Enterprise, Edu, and Team users globally. Free users will also get a sneak peek of Advanced Voice.

Plus and Free users in the EU…we’ll keep you updated, we promise.

2/2
To access Advanced Voice, remember to download the latest version of the ChatGPT app.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@OpenAI
Advanced Voice is rolling out to all Plus and Team users in the ChatGPT app over the course of the week.

While you’ve been patiently waiting, we’ve added Custom Instructions, Memory, five new voices, and improved accents.

It can also say “Sorry I’m late” in over 50 languages.

2/11
@OpenAI
If you are a Plus or Team user, you will see a notification in the app when you have access to Advanced Voice.

3/11
@OpenAI
Meet the five new voices.

4/11
@OpenAI
Set Custom Instructions for Advanced Voice.

5/11
@OpenAI
We’ve also improved conversational speed, smoothness, and accents in select foreign languages.

6/11
@OpenAI
Advanced Voice is not yet available in the EU, the UK, Switzerland, Iceland, Norway, and Liechtenstein.

7/11
@GozuMighty
voice.gpt.eth

8/11
@spffspcmn
We need that Her voice back. Juniper just doesn't cut it for me.

9/11
@Maik_Busse
If your from EU and don't have access please like for confirmation

10/11
@reach_vb
Congrats on shipping Advanced Voice Mode! At the same time I’m quite happy to see Open Source catching up:

Moshi v0.1 Release - a kyutai Collection

11/11
@ai_for_success
My meme was correct

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 1, 2024

MIT spinoff Liquid debuts non-transformer AI models and they’re already state-of-the-art

The startup from MIT's CSAIL says its Liquid Foundation Models have smaller memory needs thanks to a post-transformer architecture.

venturebeat.com

MIT spinoff Liquid debuts non-transformer AI models and they’re already state-of-the-art

Carl Franzen@carlfranzen

September 30, 2024 2:16 PM

Liquid ripples over the surface of glowing blue and purple circuitry against a black backdrop

Credit: VentureBeat made with OpenAI ChatGPT

Liquid AI, a startup co-founded by former researchers from the Massachusetts Institute of Technology (MIT)’s Computer Science and Artificial Intelligence Laboratory (CSAIL), has announced the debut of its first multimodal AI models: the “Liquid Foundation Models (LFMs).”

Unlike most others of the current generative AI wave, these models are not based around the transformer architecture outlined in the seminal 2017 paper “Attention Is All You Need.”

Instead, Liquid states that its goal “is to explore ways to build foundation models beyond Generative Pre-trained Transformers (GPTs)” and with the new LFMs, specifically building from “first principles…the same way engineers built engines, cars, and airplanes.”

It seems they’ve done just that — as the new LFM models already boast superior performance to other transformer-based ones of comparable size such as Meta’s Llama 3.1-8B and Microsoft’s Phi-3.5 3.8B.

Liquid’s LFMs currently come in three different sizes and variants:

LFM 1.3B (smallest)
LFM 3B
LFM 40B MoE (largest, a “Mixture-of-Experts” model similar to Mistral’s Mixtral)

The “B” in their name stands for billion and refers the number of parameters — or settings — that govern the model’s information processing, analysis, and output generation. Generally, models with a higher number of parameters are more capable across a wider range of tasks.

Screenshot-2024-09-30-at-5.03.53%E2%80%AFPM.png

Already, Liquid AI says the LFM 1.3B version outperforms Meta’s new Llama 3.2-1.2B and Microsoft’s Phi-1.5 on many leading third-party benchmarks including the popular Massive Multitask Language Understanding (MMLU) consisting of 57 problems across science, tech, engineering and math (STEM) fields, “the first time a non-GPT architecture significantly outperforms transformer-based models.”

All three are designed to offer state-of-the-art performance while optimizing for memory efficiency, with Liquid’s LFM-3B requiring only 16 GB of memory compared to the more than 48 GB required by Meta’s Llama-3.2-3B model (shown in the chart above).

Maxime Labonne, Head of Post-Training at Liquid AI, took to his account on X to say the LFMs were “the proudest release of my career :smile:

” and to clarify that the core advantage of LFMs: their ability to outperform transformer-based models while using significantly less memory.

This is the proudest release of my career

At @LiquidAI_, we're launching three LLMs (1B, 3B, 40B MoE) with SOTA performance, based on a custom architecture.

Minimal memory footprint & efficient inference bring long context tasks to edge devices for the first time! pic.twitter.com/v9DelExyTa

— Maxime Labonne (@maximelabonne) September 30, 2024

The models are engineered to be competitive not only on raw performance benchmarks but also in terms of operational efficiency, making them ideal for a variety of use cases, from enterprise-level applications specifically in the fields of financial services, biotechnology, and consumer electronics, to deployment on edge devices.

However, importantly for prospective users and customers, the models are not open source. Instead, users will need to access them through Liquid’s inference playground, Lambda Chat, or Perplexity AI.

How Liquid is going ‘beyond’ the generative pre-trained transformer (GPT)

In this case, Liquid says it used a blend of “computational units deeply rooted in the theory of dynamical systems, signal processing, and numerical linear algebra,” and that the result is “general-purpose AI models that can be used to model any kind of sequential data, including video, audio, text, time series, and signals” to train its new LFMs.

Last year, VentureBeat covered more about Liquid’s approach to training post-transformer AI models, noting at the time that it was using Liquid Neural Networks (LNNs), an architecture developer at CSAIL that seeks to make the artificial “neurons” or nodes for transforming data, more efficient and adaptable.

Unlike traditional deep learning models, which require thousands of neurons to perform complex tasks, LNNs demonstrated that fewer neurons—combined with innovative mathematical formulations—could achieve the same results.

Liquid AI’s new models retain the core benefits of this adaptability, allowing for real-time adjustments during inference without the computational overhead associated with traditional models, handling up to 1 million tokens efficiently, while keeping memory usage to a minimum.

A chart from the Liquid blog shows that the LFM-3B model, for instance, outperforms popular models like Google’s Gemma-2, Microsoft’s Phi-3, and Meta’s Llama-3.2 in terms of inference memory footprint, especially as token length scales.

While other models experience a sharp increase in memory usage for long-context processing, LFM-3B maintains a significantly smaller footprint, making it highly suitable for applications requiring large volumes of sequential data processing, such as document analysis or chatbots.

Liquid AI has built its foundation models to be versatile across multiple data modalities, including audio, video, and text.

With this multimodal capability, Liquid aims to address a wide range of industry-specific challenges, from financial services to biotechnology and consumer electronics.

Accepting invitations for launch event and eyeing future improvements

Liquid AI says it is is optimizing its models for deployment on hardware from NVIDIA, AMD, Apple, Qualcomm, and Cerebras.

While the models are still in the preview phase, Liquid AI invites early adopters and developers to test the models and provide feedback.

Labonne noted that while things are “not perfect,” the feedback received during this phase will help the team refine their offerings in preparation for a full launch event on October 23, 2024, at MIT’s Kresge Auditorium in Cambridge, MA. The company is accepting RSVPs for attendees of that event in-person here.

As part of its commitment to transparency and scientific progress, Liquid says it will release a series of technical blog posts leading up to the product launch event.

The company also plans to engage in red-teaming efforts, encouraging users to test the limits of their models to improve future iterations.

With the introduction of Liquid Foundation Models, Liquid AI is positioning itself as a key player in the foundation model space. By combining state-of-the-art performance with unprecedented memory efficiency, LFMs offer a compelling alternative to traditional transformer-based models.

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

1. What is the new Whisper Turbo model released by OpenAI, and how does it differ from the previous Whisper model?​

2. How does the Whisper Turbo model achieve its significant speed improvement, and what are the trade-offs?​

3. How is the Whisper Turbo model deployed locally in a browser, and what technologies are used to enable this?​

4. What are the implications of running the Whisper Turbo model locally in a browser, especially in terms of offline capability and resource usage?​

5. What are some of the user experiences and challenges reported by users when using the new Whisper Turbo model, particularly in terms of accuracy and multilingual support?​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

MIT spinoff Liquid debuts non-transformer AI models and they’re already state-of-the-art​

How Liquid is going ‘beyond’ the generative pre-trained transformer (GPT)​

Accepting invitations for launch event and eyeing future improvements​

1. What is the new Whisper Turbo model released by OpenAI, and how does it differ from the previous Whisper model?

2. How does the Whisper Turbo model achieve its significant speed improvement, and what are the trade-offs?

3. How is the Whisper Turbo model deployed locally in a browser, and what technologies are used to enable this?

4. What are the implications of running the Whisper Turbo model locally in a browser, especially in terms of offline capability and resource usage?

5. What are some of the user experiences and challenges reported by users when using the new Whisper Turbo model, particularly in terms of accuracy and multilingual support?

MIT spinoff Liquid debuts non-transformer AI models and they’re already state-of-the-art

How Liquid is going ‘beyond’ the generative pre-trained transformer (GPT)

Accepting invitations for launch event and eyeing future improvements