The A.I Megathread (LLM , GPT , Development)

bnew · Aug 24, 2023

Meet SeamlessM4T, the Meta AI model that can translate 100 languages into speech or text

Meta releases SeamlessM4T, a multilingual foundational model that can translate and transcribe 100 languages across speech and text.

venturebeat.com

Meet SeamlessM4T, the Meta AI model that can translate 100 languages into speech or text

Woman using voice assistant on smartphone in the rain
Image Credit: Oscar Wong via Getty

As part of its broader effort to remove language barriers and keep people connected, Meta has developed a multilingual foundational model that can understand nearly 100 languages from speech or text and generate translations into either or both in real time.

Officially dubbed SeamlessM4T, the multimodal technology has been publicly released to help researchers build on the development and introduce universal applications capable of delivering speech-to-speech, speech-to-text, text-to-speech and text-to-text translations. It has been made available along with SeamlessAlign, a multimodal translation dataset totaling 265,000 hours of mined speech and text alignments.

The offering marks a significant development in AI’s application in linguistics given that it’s a single system performing multiple tasks across speech and text. Prior to this, the approach largely involved different systems for different tasks, such as a dedicated system for speech-to-speech translations.

What can SeamlessM4T do?

As Meta explains, SeamlessM4T implicitly recognizes the source language without the need for a separate language identification model. It can detect speech and text in nearly 100 languages and produce text in nearly as many and speech in 36 languages. More interestingly, it can also figure out when more than one language has been mixed in the same sentence and provide translations in a single targeted language (like a sentence spoken in Telugu and Hindi and translated into English speech).

EVENT

VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

Register Now
When tested with BLASER 2.0, which allows for evaluation across speech and text units, the model performed better against background noises and speaker variations in speech-to-text tasks (with average improvements of 37% and 48%, respectively) compared to the current state-of-the-art models for speech-to-text tasks.

“SeamlessM4T outperforms previous state-of-the-art competitors,” Meta said in a blog post. “We also significantly improve performance for low and mid-resource languages (with smaller digital footprint) supported, and maintain strong performance on high-resource languages (like English).”

ADVERTISEMENT

When developed, this can lead to large-scale universal translation systems, allowing people who speak different languages to communicate more effectively.

Notably, Google is also working in this direction and has announced Universal Speech Model (USM), which can perform automatic speech recognition (ASR) for both widely-spoken and under-resourced languages.

How it all works?

To bring the model to life, Meta mined web data (tens of billions of sentences) and speech (4 million hours) from public sources and aligned them to create the SeamlessAlign dataset. In total, the company said it was able to align more than 443,000 hours of speech with texts and create about 29,000 hours of speech-to-speech alignments. Using this data, the company trained the multitask UnitY model to produce the desired multimodal outcomes.

“The multitask UnitY model consists of three main sequential components,” Meta explains. “Text and speech encoders have the task of recognizing inputs in nearly 100 languages. The text decoder then transfers that meaning into nearly 100 languages for text, followed by a text-to-unit model to decode into discrete acoustic units for 36 speech languages…The decoded discrete units are then converted into speech using a multilingual HiFi-GAN unit vocoder.”

Not perfect yet

That said, it is important to note that SeamlessM4T is far from perfect right now. Evaluations found that the model has both added toxicity (although 63% less than state-of-the-art models) and gender bias issues.

According to a whitepaper detailing the technology, SeamlessM4T overgeneralizes to masculine forms when translating from neutral terms (with an average preference of approximately 10%) while showing a lack of robustness when varying gender by an amount of about 3%.

“We detect toxicity in both the input and the output for the demo,” Meta said. “If toxicity is only detected in the output, it means that toxicity is added. In this case, we include a warning and do not show the output…Regarding bias, we have started our efforts on evaluating gender bias in languages at scale. We are now able to quantify gender bias in dozens of speech translation directions by extending to speech our previously designed Multilingual HolisticBias dataset.”

The company emphasized that this is an ongoing effort, and that it will continue to research and take action in these areas to further improve the robustness and safety of the SeamlessM4T model.

bnew · Aug 24, 2023

Introducing SeamlessM4T, a Multimodal AI Model for Speech and Text Translations

SeamlessM4T allows people to communicate effortlessly through speech and text across different languages.

about.fb.com

SeamlessM4T—Massively Multilingual & Multimodal Machine Translation | Research - AI at Meta

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in...

ai.meta.com

https://archive.is/ljivU

https://archive.is/m5TsW

Meta Debuts SeamlessM4T, the Swiss Army Knife of Translation Models

Meta doubles down on multimodal translation as it launches SeamlessM4T capable of translating speech and text.

slator.com

Meta Debuts SeamlessM4T, the Swiss Army Knife of Translation Models

It recognizes speech (that is, automatically — as in automatic speech recognition). It translates speech into speech (or text), and text into text (or speech) — in 100+ languages. Meta’s new Massively Multilingual & Multimodal Machine Translation (SeamlessM4T) is the Swiss army knife of language models. Proud parent Meta introduced the new model in a blog post published on August 22, 2023.

The SeamlessM4T launch follows a number of language technology announcements by Meta over the past 12 months. These include low resource massively multilingual MT in mid 2022, massively multilingual speech translation in May 2023, and multilingual speech model Voicebox in June 2023. The social media giant is spending considerable resources on tackling the language problem of its metaverse vision.

On X, one observer described SeamlessM4T as “revolutionary” and called it a “game-changer.” Another gushed, “It’s not just a tool; it’s a step towards a world where everyone can be understood, regardless of language.”

“The code switching support of SeamlessM4T is pretty cool!” shared a fan with a sense of humor. “It doesn’t do very well with my French or Japanese, but then again neither is very good.”

One Dr. Hubertus Becker questioned the model’s reliability for critical translations, noting, “It’s concerning that an experimental demo can alter the meaning of input words.”

Kalev Leetaru, reporting on SeamlessM4T’s performance in translating Weibo social media posts, cited inconsistent results.

“For some posts it yields translations that compare favorably to both NMT and LLM translations, but with the added cost of having to use language-specific punctuation rules to split into sentences to translate a sentence at a time,” Leetaru explained. “For other posts, it yields subpar translations that can remove or truncate key details, suggesting promise but that it is not quite ready for production use.”

Better than Whisper?

Of course, the more than 60 authors behind the August 22, 2023 paper introducing SeamlessM4T, believe in what they dubbed “the first multilingual system” to translate from and into English for both speech and text.

If the stats behind SeamlessM4T’s training seem somewhat disparate, that might be because the model required training in so many (formerly) separate and siloed tasks. Similarly, the number of languages handled by the model varies by task.

SeamlessM4T can provide automatic speech recognition (ASR) for almost 100 languages; speech-to-text (STT) translation for nearly 100 input and output languages; speech-to-speech translation and text-to-speech translation for nearly 100 input languages and 36 output languages (including English); and traditional “text” translation for close to 100 languages.

According to the authors, Meta’s motivation for the new model was to work around the existing separate systems that can complete the above tasks — but generally perform well in only one modality per system.

SeamlessM4T, by contrast, reportedly achieves state-of-the-art results for all these languages while offering “multitask support” in a single model. The paper also asserts that SeamlessM4T outperforms its previous SOTA competitors, namely Whisper and AudioPaLM-2.

Meta has publicly released the contributions to its new model, and encourages researchers and developers to build on this first iteration.

bnew · Aug 24, 2023

https://archive.is/Pl5Ow

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Instruction tuning enables pretrained language models to perform new tasks from inference-time natural language descriptions. These approaches rely on vast amounts of human supervision in the form of crowdsourced datasets or user interactions. In this work, we introduce Unnatural Instructions: a...

arxiv.org

bnew · Aug 24, 2023

https://archive.is/yta5z

bnew · Aug 24, 2023

https://archive.is/U2Mez

bnew · Aug 25, 2023

[2305.01210] Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, Lingming Zhang

Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis problems and test-cases, are used to measure the performance of various LLMs on code synthesis. However, these test-cases can be limited in both quantity and quality for fully assessing the functional correctness of the generated code. Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus -- a code synthesis benchmarking framework to rigorously evaluate the functional correctness of LLM-synthesized code. EvalPlus augments a given evaluation dataset with large amounts of test-cases newly produced by an automatic test input generator, powered by both LLM- and mutation-based strategies. While EvalPlus is general, we extend the test-cases of the popular HUMANEVAL benchmark by 81x to build HUMANEVAL+. Our extensive evaluation across 19 popular LLMs (e.g., GPT-4 and ChatGPT) demonstrates that HUMANEVAL+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by 13.6-15.3% on average. Our work not only indicates that prior popular code synthesis evaluation results do not accurately reflect the true performance of LLMs for code synthesis, but also opens up a new direction to improve such programming benchmarks through automated testing.

https://arxiv.org/pdf/2305.01210.pdf

bnew · Aug 25, 2023

Blog - Run Code Llama locally

Get up and running with large language models, locally.

ollama.ai

https://archive.is/YVvfZ

Run Code Llama locally

August 24, 2023

Today, Meta Platforms, Inc., releases Code Llama to the public, based on Llama 2 to provide state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

Code Llama is now available on Ollama to try!

If you haven’t already, installed Ollama, please download it here.

For users to play with Code Llama:

Code Llama 7 billion parameter model

ollama run codellama:7b-instruct

Example prompt:

In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month?

More models are coming, and this blog will be updated soon.

Foundation models and Python specializations are available for code generation/completions tasks

Foundation models:

More models are coming, and this blog will be updated soon.

Python specializations:

More models are coming, and this blog will be updated soon.

Ollama - library / codellama

Get up and running with large language models, locally.

ollama.ai

bnew · Aug 25, 2023

bnew · Aug 26, 2023

https://www.phind.com/blog/code-llama-beats-gpt4

Blog

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

CodeLlama-34B achieved 48.8% pass@1 on HumanEval
CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples. The methodology is:

For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.
A match was identified if any sampled substring was a substring of the processed training example.

For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval
Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.

Phind/Phind-CodeLlama-34B-v1 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Phind/Phind-CodeLlama-34B-Python-v1 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Aug 26, 2023

Perplexity Labs

labs.perplexity.ai

DEMO CHAT:
CodeLLAMA 34B instruct

bnew · Aug 26, 2023

Microsoft Launcher for Android gets Bing Chat integration, powered by ChatGPT

Microsoft Launcher version 6 is out for beta testers, bringing GPT-4-powered Bing Chat AI to Android’s home screen. Microsoft is betting big on Bing Chat, which has already been integrated into the Edge browser. And now it’s rolling out to more Android audiences via Microsoft Launcher’s update...

www.windowslatest.com

Microsoft Launcher for Android gets Bing Chat integration, powered by ChatGPT

By
Mayank Parmar
-
August 25, 2023
0

Microsoft Launcher on Surface Duo | Image Courtesy: Microsoft

Microsoft Launcher version 6 is out for beta testers, bringing GPT-4-powered Bing Chat AI to Android’s home screen. Microsoft is betting big on Bing Chat, which has already been integrated into the Edge browser. And now it’s rolling out to more Android audiences via Microsoft Launcher’s update.
Microsoft Launcher is one of the most popular launchers for Android, with over 50 million downloads, and it is also known in the niche for quality experience. Microsoft Launcher has always offered Bing integration, but today’s beta update adds “Bing Chat” to all search bars in Launcher.

For example, you can now pull the screen to open the search bar and start a Bing Chat conversation from anywhere in the Launcher. The search bar has been updated with Bing’s Chat and voice features, allowing you to chat with ChatGPT-powered Bing from anywhere, anytime.

Bing Chat opens via the search bar in Microsoft Launcher for Android | Image Courtesy: WindowsLatest.com

Microsoft officials describe the update as a new way to bridge the gap between Bing Chat and Android and allow more users to try “AI-powered copilot, which can provide helpful answers to your questions”.

Microsoft Launcher’s new search bar widget gives Bing Chat access on the home screen | Image Courtesy: WindowsLatest.com

There’s also a new Bing widget that lets you access the chat from the homepage.
Here’s the official changelog:

New – Start a Bing Chat conversation from anywhere in the Launcher you find search.
Bing is your A1-powered copilot and can provide helpful answers to your questions.
Bing Chat functionality is available in regions supported by Bing Chat.
Bing Chat functionality requires Android 8.0 and up
Bug fixes.

Microsoft said Bing Chat functionality requires Android 8.0 and newer. In our tests, we observed Bing Chat doesn’t work on older Android phones due to permission issues.

bnew · Aug 26, 2023

bnew · Aug 26, 2023

SDXL Styles

GitHub - Douleb/SDXL-750-Styles-GPT4-: "Welcome to this repository hosting a `styles.csv` file with 750+ styles for Stable Diffusion XL, generated by OpenAI's GPT-4. These diverse styles can enhance your project's output. Feel free to explore, utiliz

"Welcome to this repository hosting a `styles.csv` file with 750+ styles for Stable Diffusion XL, generated by OpenAI's GPT-4. These diverse styles can enhance your project's output. F...

github.com

About

"Welcome to this repository hosting a `styles.csv` file with 750+ styles for Stable Diffusion XL, generated by OpenAI's GPT-4. These diverse styles can enhance your project's output. Feel free to explore, utilize, and provide feedback. Happy creating!"

GitHub - twri/sdxl_prompt_styler: Custom prompt styler node for SDXL in ComfyUI

Custom prompt styler node for SDXL in ComfyUI. Contribute to twri/sdxl_prompt_styler development by creating an account on GitHub.

github.com

About

Custom prompt styler node for SDXL in ComfyUI

Custom node for ComfyUI

SDXL Prompt Styler is a node that enables you to style prompts based on predefined templates stored in multiple JSON files. The node specifically replaces a {prompt} placeholder in the 'prompt' field of each template with provided positive text.

The node also effectively manages negative prompts. If negative text is provided, the node combines this with the 'negative_prompt' field from the template. If no negative text is supplied, the system defaults to using the 'negative_prompt' from the JSON template. This flexibility enables the creation of a diverse and specific range of negative prompts.

bnew · Aug 26, 2023

SDXL 1.0 Artist Study ~4000 artists

https://weirdwonderfulai.art/resources/stable-diffusion-xl-sdxl-artist-study/

bnew · Aug 26, 2023

Leo, Brave's browser-native AI assistant, is now available in Nightly version for testing | Brave

Today we're excited to announce that Leo, the AI assistant built natively in the Brave browser, is now available for testing and feedback in the Nightly desktop channel (starting with version 1.59).

brave.com

Leo, Brave’s browser-native AI assistant, is now available in Nightly version for testing

Blog > New products & features
Last updated Aug 21, 2023

Today we’re excited to announce that Leo, the AI assistant built natively in the Brave browser, is now available for testing and feedback in the Nightly desktop channel (starting with version 1.59).

Building on the success of the Brave Search AI Summarizer, we’ve made Leo available as a companion in the browser sidebar. Leo allows users to interact with the web pages they’re visiting—for example, by asking for video transcripts or interactive article summaries—without leaving the page itself. Leo can also suggest follow-up questions, augment original content, and even help with reading comprehension. Leo can answer questions just like other AI-powered chatbots, but directly within the experience of a web page.

What is Brave Leo?

Brave Leo is a chat assistant hosted by Brave without the use of third-party AI services, available to Brave users on the desktop Nightly channel. The model behind Leo is Llama 2, a source-available large language model released by Meta with a special focus on safety. We’ve made sure that user inputs are always submitted anonymously through a reverse-proxy to our inference infrastructure. In this way, Brave can offer an AI experience with unparalleled privacy.

We’ve specifically tuned the model prompt to adhere to Brave’s core values. However, as with any other LLM, the outputs of the model should be treated with care for potential inaccuracies or errors.

How to try Leo and share feedback

Leo is available today for all users of the Brave browser desktop Nightly channel. Nightly desktop users can access Leo via the

button in Brave Sidebar.

Are you a Brave Nightly user? Please tell us what you think of Leo!

A note on anonymity

Leo is free to use for any desktop Nightly user, and no user login or account is required. Chats in Leo cannot be used for training purposes, and no one can review those conversations, as they’re not persisted on Brave’s servers—conversations are discarded immediately after the reply is generated. For this reason, there’s no way to review past conversations or delete that data—it isn’t stored in the first place.

What data does the Brave browser send?

If you use Leo, the browser shares with the server your latest query, your ongoing conversation history and, when the use case calls for it, only the necessary context from the page you’re actively viewing (e.g. the article’s text, or the YouTube video transcript).

How can I get better results out of Leo?

As with any AI, the more specific you are with your prompts and context, the better the results Leo can provide. Remember to give Leo clear, detailed instructions and, if you don’t get exactly the answer you’re looking for, to try wording your query/prompt a different way.

Does Leo have access to live information?

For now, Leo does not have access to live information. However, in future releases we do plan to offer a version of Leo with some level of access to current information. This will be powered by our own independent Brave Search.

What’s next for Brave Leo?

In addition to incorporating live information, we’ll be making improvements to Leo’s accuracy and user experience. We hope to release Leo to all Brave browser users in the coming months.

The A.I Megathread (LLM , GPT , Development)

Veteran

Meet SeamlessM4T, the Meta AI model that can translate 100 languages into speech or text​

What can SeamlessM4T do?​

EVENT​

How it all works?​

Not perfect yet​

Veteran

Meta Debuts SeamlessM4T, the Swiss Army Knife of Translation Models​

Better than Whisper?​

Veteran

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor​

Veteran

Veteran

Veteran

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation​

Veteran

Run Code Llama locally​

August 24, 2023​

Code Llama is now available on Ollama to try!​

Foundation models and Python specializations are available for code generation/completions tasks​

Veteran

Veteran

Blog​

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B​

Download​

Veteran

Veteran

Microsoft Launcher for Android gets Bing Chat integration, powered by ChatGPT​

Veteran

Veteran

SDXL Styles​

About​

About​

Custom node for ComfyUI​

Veteran

SDXL 1.0 Artist Study ~4000 artists​

​

Veteran

Leo, Brave’s browser-native AI assistant, is now available in Nightly version for testing​

What is Brave Leo?​

How to try Leo and share feedback​

A note on anonymity​

What data does the Brave browser send?​

How can I get better results out of Leo?​

Does Leo have access to live information?​

What’s next for Brave Leo?​

Meet SeamlessM4T, the Meta AI model that can translate 100 languages into speech or text

What can SeamlessM4T do?

EVENT

How it all works?

Not perfect yet

Meta Debuts SeamlessM4T, the Swiss Army Knife of Translation Models

Better than Whisper?

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Run Code Llama locally

August 24, 2023

Code Llama is now available on Ollama to try!

Foundation models and Python specializations are available for code generation/completions tasks

Blog

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Download

Microsoft Launcher for Android gets Bing Chat integration, powered by ChatGPT

SDXL Styles

About

About

Custom node for ComfyUI

SDXL 1.0 Artist Study ~4000 artists

Leo, Brave’s browser-native AI assistant, is now available in Nightly version for testing

What is Brave Leo?

How to try Leo and share feedback

A note on anonymity

What data does the Brave browser send?

How can I get better results out of Leo?

Does Leo have access to live information?

What’s next for Brave Leo?