bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155

Millions of new materials discovered with deep learning​

Published29 NOVEMBER 2023Authors

Amil Merchant and Ekin Dogus Cubuk

WpdZQYgBnHWIjBs-21n8EQrrdsvHnrUo_IhkDIz8UIIJkB-Fv09q1r0u97QKIv7Uja1pmujl1aLkZqqlf62UsPRP53t478q_-_GShn7DfZlBd9wT=w1072-h603-n-nu

AI tool GNoME finds 2.2 million new crystals, including 380,000 stable materials that could power future technologies

Modern technologies from computer chips and batteries to solar panels rely on inorganic crystals. To enable new technologies, crystals must be stable otherwise they can decompose, and behind each new, stable crystal can be months of painstaking experimentation.

Today, in a paper published in Nature, we share the discovery of 2.2 million new crystals – equivalent to nearly 800 years’ worth of knowledge. We introduce Graph Networks for Materials Exploration (GNoME), our new deep learning tool that dramatically increases the speed and efficiency of discovery by predicting the stability of new materials.

With GNoME, we’ve multiplied the number of technologically viable materials known to humanity. Of its 2.2 million predictions, 380,000 are the most stable, making them promising candidates for experimental synthesis. Among these candidates are materials that have the potential to develop future transformative technologies ranging from superconductors, powering supercomputers, and next-generation batteries to boost the efficiency of electric vehicles.

GNoME shows the potential of using AI to discover and develop new materials at scale. External researchers in labs around the world have independently created 736 of these new structures experimentally in concurrent work. In partnership with Google DeepMind, a team of researchers at the Lawrence Berkeley National Laboratory has also published a second paper in Nature that shows how our AI predictions can be leveraged for autonomous material synthesis.

We’ve made GNoME’s predictions available to the research community. We will be contributing 380,000 materials that we predict to be stable to the Materials Project, which is now processing the compounds and adding them into its online database. We hope these resources will drive forward research into inorganic crystals, and unlock the promise of machine learning tools as guides for experimentation


Accelerating materials discovery with AI​

YERGAfO8cXC9ZIYrLWsj2ow-Ml2V2kgXapvF6AuelhSnDosn5xsPf7Qdhkza3jn0aJbDFoR1dYa74XUZlX1CYLHNO2xcQsvl6tgFVmGzb0CmR4y_6Q=w1070

About 20,000 of the crystals experimentally identified in the ICSD database are computationally stable. Computational approaches drawing from the Materials Project, Open Quantum Materials Database and WBM database boosted this number to 48,000 stable crystals. GNoME expands the number of stable materials known to humanity to 421,000.

In the past, scientists searched for novel crystal structures by tweaking known crystals or experimenting with new combinations of elements - an expensive, trial-and-error process that could take months to deliver even limited results. Over the last decade, computational approaches led by the Materials Project and other groups have helped discover 28,000 new materials. But up until now, new AI-guided approaches hit a fundamental limit in their ability to accurately predict materials that could be experimentally viable. GNoME’s discovery of 2.2 million materials would be equivalent to about 800 years’ worth of knowledge and demonstrates an unprecedented scale and level of accuracy in predictions.

For example, 52,000 new layered compounds similar to graphene that have the potential to revolutionize electronics with the development of superconductors. Previously, about 1,000 such materials had been identified. We also found 528 potential lithium ion conductors, 25 times more than a previous study, which could be used to improve the performance of rechargeable batteries.

We are releasing the predicted structures for 380,000 materials that have the highest chance of successfully being made in the lab and being used in viable applications. For a material to be considered stable, it must not decompose into similar compositions with lower energy. For example, carbon in a graphene-like structure is stable compared to carbon in diamonds. Mathematically, these materials lie on the convex hull. This project discovered 2.2 million new crystals that are stable by current scientific standards and lie below the convex hull of previous discoveries. Of these, 380,000 are considered the most stable, and lie on the “final” convex hull – the new standard we have set for materials stability.


GNoME: Harnessing graph networks for materials exploration​

jYtAc1goZdBnOkfAzEzfdQP6rAWWBYmq2tURb30fxMVYeTDYMzgn_zCoQwi60cNXZa2c8MNoWsptPRlbDTz2IrouYuEAtCPh_51pZ2WH1UpbeXydVQ=w1440

GNoME uses two pipelines to discover low-energy (stable) materials. The structural pipeline creates candidates with structures similar to known crystals, while the compositional pipeline follows a more randomized approach based on chemical formulas. The outputs of both pipelines are evaluated using established Density Functional Theory calculations and those results are added to the GNoME database, informing the next round of active learning.

GNoME is a state-of-the-art graph neural network (GNN) model. The input data for GNNs take the form of a graph that can be likened to connections between atoms, which makes GNNs particularly suited to discovering new crystalline materials.

GNoME was originally trained with data on crystal structures and their stability, openly available through the Materials Project. We used GNoME to generate novel candidate crystals, and also to predict their stability. To assess our model’s predictive power during progressive training cycles, we repeatedly checked its performance using established computational techniques known as Density Functional Theory (DFT), used in physics, chemistry and materials science to understand structures of atoms, which is important to assess the stability of crystals.

We used a training process called ‘active learning’ that dramatically boosted GNoME’s performance. GNoME would generate predictions for the structures of novel, stable crystals, which were then tested using DFT. The resulting high-quality training data was then fed back into our model training.

Our research boosted the discovery rate of materials stability prediction from around 50%, to 80% - based on an external benchmark set by previous state-of-the-art models. We also managed to scale up the efficiency of our model by improving the discovery rate from under 10% to over 80% - such efficiency increases could have significant impact on how much compute is required per discovery.


AI ‘recipes’ for new materials​

The GNoME project aims to drive down the cost of discovering new materials. External researchers have independently created 736 of GNoME’s new materials in the lab, demonstrating that our model’s predictions of stable crystals accurately reflect reality. We’ve released our database of newly discovered crystals to the research community. By giving scientists the full catalog of the promising ‘recipes’ for new candidate materials, we hope this helps them to test and potentially make the best ones.

ymNK3VxqrlfFbT_Pyz8Buz3ysZ17KswiKfQhyWpY31QpwzRHO-KHAPHZFVWlLd3AWb7cp--qiEkOXQmaN-p0J6PkKIM7QpMSIADgNR0526y3zDCgGQ=w616

Upon completion of our latest discovery efforts, we searched the scientific literature and found 736 of our computational discoveries were independently realized by external teams across the globe. Above are six examples ranging from a first-of-its-kind Alkaline-Earth Diamond-Like optical material (Li4MgGe2S7) to a potential superconductor (Mo5GeB2).

Rapidly developing new technologies based on these crystals will depend on the ability to manufacture them. In a paper led by our collaborators at Berkeley Lab, researchers showed a robotic lab could rapidly make new materials with automated synthesis techniques. Using materials from the Materials Project and insights on stability from GNoME, the autonomous lab created new recipes for crystal structures and successfully synthesized more than 41 new materials, opening up new possibilities for AI-driven materials synthesis.

bthZ6UxFcEOVB5mbBtNo1kHBfO0Ubuu5pn-XUeGZNIGQVJYIsznm6QqLJnorrYdiGNCC6IbF7_9p3ZItbgRtYj6HY3-8lz-KiCS-v7ySl8eANw7t=w616

A-Lab, a facility at Berkeley Lab where artificial intelligence guides robots in making new materials. Photo credit: Marilyn Sargent/Berkeley Lab

New materials for new technologies​

To build a more sustainable future, we need new materials. GNoME has discovered 380,000 stable crystals that hold the potential to develop greener technologies – from better batteries for electric cars, to superconductors for more efficient computing.

Our research – and that of collaborators at the Berkeley Lab, Google Research, and teams around the world — shows the potential to use AI to guide materials discovery, experimentation, and synthesis. We hope that GNoME together with other AI tools can help revolutionize materials discovery today and shape the future of the field.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155



(Chinese) https://github.com/IEIT-Yuan/Yuan-2.0

Paper: https://arxiv.org/abs/2311.15786






 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155

AirLLM optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155

Microsoft joins OpenAI’s board with Sam Altman officially back as CEO


After a failed attempt to oust Altman by the board, OpenAI’s largest investor is getting a non-voting observer seat.​


1778705436.jpg

By Alex Heath, a deputy editor and author of the Command Line newsletter. He’s covered the tech industry for over a decade at The Information and other outlets.
Nov 29, 2023, 7:50 PM EST|15 Comments / 15 New

Sam Altman is officially OpenAI’s CEO again.

Just before Thanksgiving, the company said it had reached a deal in principle for him to return, and now it’s done. Microsoft is getting a non-voting observer seat on the nonprofit board that controls OpenAI as well, the company announced on Wednesday.

“I have never been more excited about the future,” Altman said in a memo to employees shared with The Verge. “I am extremely grateful for everyone’s hard work in an unclear and unprecedented situation, and I believe our resilience and spirit set us apart in the industry. I feel so, so good about our probability of success for achieving our mission.”

With three of the four board members who decided to suddenly fire Altman now gone, OpenAI’s new board consists of chair Bret Taylor, Larry Summers, and Adam D’Angelo, the only remaining holdout from the previous board.

OpenAI adding Microsoft to the board as a “non-voting observer” means that the tech giant will have more visibility into the company’s inner workings but not have an official vote in big decisions. Microsoft is a major investor in OpenAI, with a 49 percent stake in the for-profit entity that the nonprofit board controls. Until now, it’s had no visibility into that board. That led to a big surprise when Altman was ousted, threatening what has quickly become one of the most important partnerships in tech.

A spokesperson for Microsoft declined to comment on who from the company would fill its observer seat.

In his memo to employees, Altman said that he harbors “zero ill will” toward Ilya Sutskever, OpenAI’s co-founder and chief scientist who initially participated in the board coup and changed his mind after nearly all of the company’s employees threatened to quit if Altman didn’t come back. “While Ilya will no longer serve on the board, we hope to continue our working relationship and are discussing how he can continue his work at OpenAI,” Altman said.

“The fact that we did not lose a single customer will drive us to work even harder for you,” he told employees.

Below is Sam Altman’s full memo shared with OpenAI employees on Wednesday:

I am returning to OpenAI as CEO. Mira will return to her role as CTO. The new initial board will consist of Bret Taylor (Chair), Larry Summers, and Adam D’Angelo.​

I have never been more excited about the future. I am extremely grateful for everyone’s hard work in an unclear and unprecedented situation, and I believe our resilience and spirit set us apart in the industry. I feel so, so good about our probability of success for achieving our mission.

Before getting to what comes next, I’d like to share some thanks.

I love and respect Ilya, I think he’s a guiding light of the field and a gem of a human being. I harbor zero ill will towards him. While Ilya will no longer serve on the board, we hope to continue our working relationship and are discussing how he can continue his work at OpenAI.

I am grateful to Adam, Tasha, and Helen for working with us to come to this solution that best serves the mission. I’m excited to continue to work with Adam and am sincerely thankful to Helen and Tasha for investing a huge amount of effort in this process.

Thank you also to Emmett who had a key and constructive role in helping us reach this outcome. Emmett’s dedication to AI safety and balancing stakeholders’ interests was clear.

Mira did an amazing job throughout all of this, serving the mission, the team, and the company selflessly throughout. She is an incredible leader and OpenAI would not be OpenAI without her. Thank you.

Greg and I are partners in running this company. We have never quite figured out how to communicate that on the org chart, but we will. In the meantime, I just wanted to make it clear. Thank you for everything you have done since the very beginning, and for how you handled things from the moment this started and over the past few days.

The leadership team–Mira, Brad, Jason, Che, Hannah, Diane, Anna, Bob, Srinivas, Matt, Lilian, Miles, Jan, Wojciech, John, Jonathan, Pat, and many more–is clearly ready to run the company without me. They say one way to evaluate a CEO is how you pick and train your potential successors; on that metric I am doing far better than I realized. It’s clear to me that the company is in great hands, and I hope this is abundantly clear to everyone. Thank you all.

Jakub, Szymon, and Aleksander are exceptional talents and I’m so happy they have rejoined to move us and our research forward. Thank you.

To all of you, our team: I am sure books are going to be written about this time period, and I hope the first thing they say is how amazing the entire team has been. Now that we’re through all of this, we didn’t lose a single employee. You stood firm for each other, this company, and our mission. One of the most important things for the team that builds AGI safely is the ability to handle stressful and uncertain situations, and maintain good judgment throughout. Top marks. Thank you all.

Satya, Kevin, Amy, and Brad have been incredible partners throughout this, with exactly the right priorities all the way through. They’ve had our backs and were ready to welcome all of us if we couldn’t achieve our primary goal. We clearly made the right choice to partner with Microsoft and I’m excited that our new board will include them as a non-voting observer. Thank you.

To our partners and users, thank you for sticking with us. We really felt the outpouring of support and love, and it helped all of us get through this. The fact that we did not lose a single customer will drive us to work even harder for you, and we are all excited to get back to work.

Will Hurd, Brian Chesky, Bret Taylor and Larry Summers put their lives on hold and did an incredible amount to support the mission. I don’t know how they did it so well, but they really did. Thank you.

Ollie also put his life on hold this entire time to just do everything he could to help out, in addition to providing his usual unconditional love and support. Thank you and I love you.

So what’s next?

We have three immediate priorities.

● Advancing our research plan and further investing in our full-stack safety efforts, which have always been critical to our work. Our research roadmap is clear; this was a wonderfully focusing time. I share the excitement you all feel; we will turn this crisis into an opportunity! I’ll work with Mira on this.

● Continuing to improve and deploy our products and serve our customers. It’s important that people get to experience the benefits and promise of AI, and have the opportunity to shape it. We continue to believe that great products are the best way to do this. I’ll work with Brad, Jason and Anna to ensure our unwavering commitment to users, customers, partners and governments around the world is clear.

● Bret, Larry, and Adam will be working very hard on the extremely important task of building out a board of diverse perspectives, improving our governance structure, and overseeing an independent review of recent events. I look forward to working closely with them on these crucial steps so everyone can be confident in the stability of OpenAI.

I am so looking forward to finishing the job of building beneficial AGI with you all—best team in the world, best mission in the world.

Love, Sam

And here’s the full memo OpenAI board chair Bret Taylor sent to employees:

On behalf of the OpenAI Board, I want to express our gratitude to the entire OpenAI community, especially all the OpenAI employees, who came together to help find a path forward for the company over the past week. Your efforts helped enable this incredible organization to continue to serve its mission to ensure that artificial general intelligence benefits all of humanity. We are thrilled that Sam, Mira and Greg are back together leading the company and driving it forward. We look forward to working with them and all of you.​

As a Board, we are focused on strengthening OpenAI’s corporate governance. Here’s how we plan to do it:

● We will build a qualified, diverse Board of exceptional individuals whose collective experience represents the breadth of OpenAI’s mission – from technology to safety to policy. We are pleased that this Board will include a non-voting observer for Microsoft.

● We will further stabilize the OpenAI organization so that we can continue to serve our mission. This will include convening an independent committee of the Board to oversee a review of the recent events.

● We will enhance the governance structure of OpenAI so that all stakeholders – users, customers, employees, partners, and community members – can trust that OpenAI will continue to thrive.

OpenAI is a more important institution than ever before. ChatGPT has made artificial intelligence a part of daily life for hundreds of millions of people. Its popularity has made AI – its benefits and its risks – central to virtually every conversation about the future of governments, business, and society.

We understand the gravity of these discussions and the central role of OpenAI in the development and safety of these awe-inspiring new technologies. Each of you plays a critical part in ensuring that we effectively meet these challenges. We are committed to listening and learning from you, and I hope to speak with you all very soon.

We are grateful to be a part of OpenAI, and excited to work with all of you.

Thank you,

Bret Taylor Chair, OpenAI
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155


Computer Science > Computation and Language​

[Submitted on 8 Sep 2023]

From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting​

Griffin Adams, Alexander Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad
Selecting the ``right'' amount of information to include in a summary is a difficult task. A good summary should be detailed and entity-centric without being overly dense and hard to follow. To better understand this tradeoff, we solicit increasingly dense GPT-4 summaries with what we refer to as a ``Chain of Density'' (CoD) prompt. Specifically, GPT-4 generates an initial entity-sparse summary before iteratively incorporating missing salient entities without increasing the length. Summaries generated by CoD are more abstractive, exhibit more fusion, and have less of a lead bias than GPT-4 summaries generated by a vanilla prompt. We conduct a human preference study on 100 CNN DailyMail articles and find that that humans prefer GPT-4 summaries that are more dense than those generated by a vanilla prompt and almost as dense as human written summaries. Qualitative analysis supports the notion that there exists a tradeoff between informativeness and readability. 500 annotated CoD summaries, as well as an extra 5,000 unannotated summaries, are freely available on HuggingFace (this https URL).
Comments:preprint
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2309.04269 [cs.CL]
(or arXiv:2309.04269v1 [cs.CL] for this version)
[2309.04269] From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Focus to learn more

Submission history​

From: Griffin Adams [view email]
[v1] Fri, 8 Sep 2023 11:31:08 UTC (8,849 KB)




fIO8ZlH.jpeg


Article: {{ ARTICLE }}
You will generate increasingly concise, entity-dense summaries of the above article.

Repeat the following 2 steps 5 times.

Step 1. Identify 1-3 informative entities (";" delimited) from the article which are missing from the previously generated summary.
Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the missing entities.

A missing entity is:
- relevant to the main story,
- specific yet concise (5 words or fewer),
- novel (not in the previous summary),
- faithful (present in the article),
- anywhere (can be located anywhere in the article).

Guidelines:

- The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., "this article discusses") to reach ~80 words.
- Make every word count: rewrite the previous summary to improve flow and make space for additional entities.
- Make space with fusion, compression, and removal of uninformative phrases like "the article discusses".
- The summaries should become highly dense and concise yet self-contained, i.e., easily understood without the article.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made, add fewer new entities.

Remember, use the exact same number of words for each summary.
Answer in JSON. The JSON should be a list (length 5) of dictionaries whose keys are "Missing_Entities" and "Denser_Summary".



 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155


Introducing a suite of AI language translation models that preserve expression and improve streaming

November 30, 2023•
7 minute read

In our increasingly interconnected world, where language differences may present a barrier to communication, translation systems can enable people from different linguistic backgrounds to share knowledge and experiences more seamlessly. However, many of these systems today do not preserve key elements of speech that make human communication human. More specifically, it’s not just the words we choose that convey what we want to say—it’s also how we speak them. Tone of voice, pauses, and emphasis carry important signals that help us communicate emotions and intent. Moreover, human speech and translation are sensitive to nuances such as turn-taking and timing controls. Picture, for example, how human interpreters work: they find just the right balance between low-latency and accurate translations. Waiting too long stifles the flow of communication, while going too fast compromises the overall quality of a translation. Translation systems that enable authentic conversations should deliver across all of these elements of communication.

Today, we are excited to share Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real time. To build Seamless, we developed SeamlessExpressive, a model for preserving expression in speech-to-speech translation, and SeamlessStreaming, a streaming translation model that delivers state-of-the-art results with around two seconds of latency. All of the models are built on SeamlessM4T v2, the latest version of the foundational model we released in August. SeamlessM4T v2 demonstrates performance improvements for automatic speech recognition, speech-to-speech, speech-to text, and text-to-speech capabilities. Compared to previous efforts in expressive speech research, SeamlessExpressive addresses certain underexplored aspects of prosody, such as speech rate and pauses for rhythm, while also preserving emotion and style. The model currently preserves these elements in speech-to-speech translation between English, Spanish, German, French, Italian, and Chinese.

SeamlessStreaming unlocks real-time conversations with someone who speaks a different language by generating the translation while the speaker is still talking. In contrast to conventional systems which translate when the speaker has finished their sentence, SeamlessStreaming translates while the speaker is still talking. This means that the person they're speaking to can hear a translation in closer to real-time - there is a delay of a few seconds - rather than waiting until the speaker has finished their sentence. SeamlessStreaming supports automatic speech recognition and speech-to-text translation for nearly 100 input and output languages, and speech-to-speech translation for nearly 100 input languages and 36 output languages. In keeping with our approach to open science, we’re publicly releasing all four models to allow researchers to build on this work.

Introducing metadata, data and data alignment tools

404606921_713491303726717_1432456565037444092_n.png

Today, alongside our models, we are releasing metadata, data and data alignment tools to assist the research community, including:


  • Metadata of an extension of SeamlessAlign corresponding to an additional 115,000 hours of speech and text alignments on top of the existing 470k hours. In addition to more hours, the latest version of SeamlessAlign covers a broader range of languages (from 37 previously to 76 with the extension). This corpus is the largest public speech/speech and speech/text parallel corpus in terms of total volume and language coverage to date.
  • Metadata of SeamlessAlignExpressive, an expressivity-focused version of the dataset above. In this dataset, the pairs are parallel from both a semantic and prosodic perspective. SeamlessAlignExpressive is released as a benchmark to validate our expressive alignment approach. In order to train our expressive models, we applied our alignment method to a proprietary dataset.
  • Translated text data for mExpresso, a multilingual, parallel extension of read speech in Expresso, a high-quality expressive speech dataset that includes both read speech and improvised dialogues rendered in different styles. This text benchmark enables benchmarking expressive translation systems from English into other languages.
  • Tools to assist the research community in collecting more datasets for translation.

In particular, we are updating our stopes library and SONAR encoders. With these tools, anyone can automatically create multimodal translation pairs from their own speech and/or text monolingual data through parallel data alignment methods.

Our approach

405144930_729447995297615_1468827516987798958_n.png

All our models run on fairseq2, the latest update of our sequence modeling toolkit. Similar to our previous work on SeamlessM4T, fairseq2 offers an ideal framework for building our streaming and expressivity updates because it is lightweight, easily composable with other PyTorch ecosystem libraries, and has more efficient modeling and data loader APIs.


UnitY2, a new architecture that has a non-autoregressive text-to-unit decoder, is also instrumental to our work. In SeamlessM4T v2, we used multitask-UnitY2 to enable text input (updated from v1's multitask-UnitY). We also used the architecture for SeamlessStreaming and SeamlessExpressive. As our next generation multitask model, UnitY2 has superior speech generation capabilities through its improved text-to-unit model. This implementation leads to improved consistency between text output and speech output, compared to the SeamlessM4T v1 model.

Instead of using an autoregressive text-to-unit model as in UnitY, we used a non-autoregressive model. Autoregressive models predict the next token based on the previously generated tokens. While autoregressive models model speech naturally, they scale poorly as sequence length increases. They are also more likely to exhibit repetitive degeneration. Non-autoregressive models predict the duration of each segment, which enables each segment to be decoded in parallel. This makes them robust to long sequences, and we see improvements over the initial iteration of UnitY. Since the model inherently predicts duration, it is much more easily adaptable to the streaming use case, because we know exactly how much speech is needed to be generated for each piece of text, which is not the case for autoregressive models.

Streaming

EMMA is our core streaming algorithm, which allows us to intelligently decide when we have enough information to generate the next speech segment or target text. It improves upon previous state-of-the-art algorithms especially for long input sequences, which is the case for speech-to-text or speech-to-speech translation. Further, this algorithm allows us to fine-tune from offline models, which allows us to reap the benefits of the Seamless M4T v2 foundation model. Finally, we show empirically that this algorithm generalizes well across many different language pairs, which is particularly challenging for streaming models because the language pairs may be structured differently.

Expressivity

Preserving expression also requires a new approach. We replaced the unit HiFi-GAN vocoder in SeamlessM4T v2 with PRETSSEL, an expressive unit-to-speech generator. PRETSSEL is conditioned on the source speech for waveform generation to transfer tones, emotional expression, and vocal style qualities. We initialize our model from SeamlessM4T v2 in order to achieve high translation quality, which is the most fundamental need for a speech-to-speech translation system. We also developed Prosody UnitY2, integrating an expressivity encoder in SeamlessM4T v2 to guide unit generation with proper rhythm, speaking rate, and pauses. In addition, we release a suite of evaluation tools to capture the preservation of these aspects of expressivity.

Results

405207446_367250229296427_7774390899813942082_n.png

The updates to UnitY2 have resulted in improved translation quality across a variety of tasks. SeamlessM4T v2 achieves sate of the art translation for speech-to-speech and speech-to-text results in 100 languages. In the same model, it also beats Whisper v3’s for automatic speech recognition on average and in particular for lower resource languages.


For speech-to-text translation, SeamlessM4T v2 improves by 10% compared to the model we released in August and by more than 17% over the strongest cascaded models when translating into English. For speech-to-speech translation, SeamlessM4T v2, improves over SeamlessM4T (v1) by more than 15% when translating into English, and by 25% when translating from English.

In other tasks, SeamlessM4T v2 is on par with No Language Left Behind (NLLB) in text-to-text translation. It is also on-par on average with MMS in automatic speech recognition (ASR) (with better performance on mid and high-resource languages while MMS has better performance on low resource languages), and improving over the recently released Whisper-Large-v3 by more than 25%. In the zero-shot task of text-to-speech translation, SeamlessM4T v2 is on-par with strong cascaded models into English, and improves over these baselines by 16 percent in English.

We compared SeamlessExpressive against a cascaded speech-to-text and text-to-speech pipeline, where speech-to-text is from SeamlessM4T v2, and text-to-speech is from strong open-sourced cross-lingual text-to-speech system that supports vocal style and emotion transfer. Results show that SeamlessExpressive is more stable with respect to noise in the source speech such that the output speech maintains high content translation quality, and better preserves styles and speech rate. SeamlessStreaming achieves state of the art low latency quality with speech-to-speech translation.

How we built AI translation systems responsibly: Toxicity mitigation

{removed to fit max characters}

Audio watermarking

{removed to fit max characters}

Providing access to our technology

The breakthroughs we’ve achieved with Seamless show that the dream of a universal, real-time translator isn’t science fiction—it’s becoming a technical reality. We invite everyone to try our expressive translation demo. We’re also making our code, model and data available to the research community.

Try the expressive translation demo

Try the Hugging Face demo

Download the code, model, and data

Read the paper

Visit the Seamless website
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155

Nvidia CEO Jensen Huang says artificial general intelligence will be achieved in five years​

Aaron Mok

Nov 29, 2023, 4:56 PM EST


Jensen Huang, CEO of NVIDIA, holding a chip.

Nvidia CEO Jensen Huang said AGI will be achieved in five years during the 2023 NYT DealBook Summit.

Sam Yeh / Contributor


  • Nvidia CEO Jensen Huang said AGI will be reached in five years during the 2023 NYT DealBook Summit.
  • Huang defined AGI as tech that exhibits basic intelligence "fairly competitive" to a normal human.
  • Still, he admitted that AI technology is not quite there yet despite its rapid progress.


Jensen Huang, the CEO of Nvidia — one of the companies that is fueling the AI revolution — predicts that we may be able to see artificial general intelligence, or AGI, within the next five years.

During the 2023 New York Times DealBook Summit, the outlet's Andrew Ross Sorkin asked Huang if he expected to see AGI in the next 10 years.

"By depending on how you define it, I think the answer is yes," Huang replied.

At the summit, Huang defined AGI as a piece of software or a computer that can complete tests which reflect basic intelligence that's "fairly competitive" to that of a normal human.

"I would say that within the next five years, you're gonna see, obviously, AIs that can achieve those tests," Huang said.

While the CEO didn't specify what exactly he thinks AGI would look like, Ross Sorkin asked if AGI would refer to AI that can design the chips Nvidia is currently making, to which Huang agreed.

"Will you need to have the same staff that designs them?" Sorkin asked as a follow-up, referring to the development of Nvidia's chips.

"In fact, none of our chips are possible today without AI," Huang said.

He specified that the H-100 chips he said Nvidia is shipping today were designed with help from a number of AIs.

"Software can't be written without AI, chips can't be designed without AI, nothing's possible," he concluded on the point of AI's potential.

Even though Huang said that AI is developing faster than he expected, he said the technology hasn't showed signs it can exhibit or surpass complex human intelligence just yet.

"There's no question that the rate of progress is high," he said. "But there's a whole bunch of things that we can't do yet."

"This multi-step reasoning that humans are very good at, AI can't do," he said.

The CEO's thoughts on AGI come as some business leaders sound the alarm about what they personally consider to be AGI.

Ilya Sutskever, cofounder of OpenAI, the company behind ChatGPT, said that AI in its most advanced form will create new problems such as a surge in fake news and cyberattacks, automated AI weapons, and even "infinitely stable dictatorships."

Ian Hogarth, who has invested in more than 50 AI companies, said that a future "God-like AI" would lead to the "obsolescence or destruction of the human race" if the rapid development of the technology isn't regulated.

Huang isn't the only tech leader who believes that AGI will be achieved in the near future.

In February, ex-Meta executive John Carmack said that AGI will be achieved by the 2030s and be worth trillions of dollars.

A few months later, Demis Hassabis, CEO and cofounder of DeepMind, Google's AI division, predicted that AI that is as powerful as the human brain would arrive within the next few years.

Nvidia didn't immediately respond to Business Insider's request for comment.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155

Deepmind's new prompting method takes a step back for more accuracy​

Nov 26, 2023 Jonathan Kemper


close_up_of_shoe_going_back_cartoon_illustration-1200x673.png

Midjourney prompted by THE DECODER

A recent paper from Alphabet's AI company Google Deepmind shows that a simple tweak to prompts can significantly improve the accuracy of large language models. The technique taps into the human ability to abstract.

Step-back prompting asks the LLM a general question before the actual task. This allows the system to retrieve relevant background information and better categorize the actual question. The method is easy to implement with just one additional introductory question.

Question:

Which school did Estella Leopold attend between August 1954 and November 1954?

Step-back question :

What was Estella Leopold's educational history?

Step-Back Answer:

B.S. in Botany, University of Wisconsin, Madison, 1948

M.S. in Botany, University of California, Berkeley, 1950

Ph.D. in Botany, Yale University, 1955


Final answer:

From 1951 to 1955, she was enrolled in the Ph.D. program in Botany at Yale. from 1951 to 1955, so Estella Leopold was most likely at Yale University between August 1954 and November 1954.

The Deepmind study tested step-back prompting on the PaLM-2L language model and compared it to the base model and GPT-4. The researchers were able to increase the accuracy of the language models by up to 36 percent compared to chain-of-thought (CoT) prompting.

Improvements across all tested domains​

Step-back prompting was tested in the areas of science, general knowledge, and reasoning. The researchers observed the greatest improvements in more complex tasks requiring multiple steps of reasoning.

In physics and chemistry tasks, accuracy increased by 7 to 11 percent compared to the unmodified model. The adapted PaLM-2L even outperformed GPT-4 by a few percentage points. The abstract question of the experiment was: "What physical or chemical principles and concepts are needed to solve this problem?"


lBWyKz3.png

Image: Zheng et al.

Most importantly, DeepMind's prompting method also performed significantly better than existing methods such as chain-of-thought and "take a deep breath" (TDB), which only marginally improved or even worsened accuracy.

PaLM-2L can achieve better performance with step-back prompting than GPT-4​

The improvement was even more pronounced for knowledge questions with a temporal component from the TimeQA dataset. Here, the gain from a combination of step-back prompting and retrieval augmented generation (RAG) was a whopping 27 percentage points over the base model, making it about 23 percent more accurate than GPT-4. Of course, step-back prompting can be used with GPT-4 as well; the comparison is just to show the performance gain.


l3wtIcM.png

Image: Zheng et al.

Even on particularly difficult knowledge questions, which were less likely to be answered correctly with RAG, the researchers found a significant gain in accuracy with step-back prompting. "This is where STEP-BACK PROMPTING really shines by retrieving facts regarding high-level concepts to ground the final reasoning," the paper states.

Despite the promising results, the error analysis showed that multilevel reasoning is still one of the most difficult skills for an LLM. The technique is also not always effective or helpful, for example, when the answer is common knowledge ("Who was president of the USA in 2000?") or when the question is already at a high level of abstraction ("What is the speed of light?").

Sources:

Arxiv




Computer Science > Machine Learning​

[Submitted on 9 Oct 2023]

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models​

Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H. Chi, Quoc V Le, Denny Zhou

We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide the reasoning steps, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L models and observe substantial performance gains on a wide range of challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU Physics and Chemistry by 7% and 11%, TimeQA by 27%, and MuSiQue by 7%.

Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:arXiv:2310.06117 [cs.LG]
(or arXiv:2310.06117v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2310.06117

Focus to learn more

Submission history​

From: Swaroop Mishra [view email]

[v1] Mon, 9 Oct 2023 19:48:55 UTC (675 KB)

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,228
Reputation
8,195
Daps
156,155

llamafile​



llamafile-250


llamafile lets you distribute and run LLMs with a single file (blog post)

Our goal is to make the "build once anywhere, run anywhere" dream come true for AI developers. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that lets you build apps for LLMs as a single-file artifact that runs locally on most PCs and servers.

First, your llamafiles can run on multiple CPU microarchitectures. We added runtime dispatching to llama.cpp that lets new Intel systems use modern CPU features without trading away support for older computers.

Secondly, your llamafiles can run on multiple CPU architectures. We do that by concatenating AMD64 and ARM64 builds with a shell script that launches the appropriate one. Our file format is compatible with WIN32 and most UNIX shells. It's also able to be easily converted (by either you or your users) to the platform-native format, whenever required.

Thirdly, your llamafiles can run on six OSes (macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD). You'll only need to build your code once, using a Linux-style toolchain. The GCC-based compiler we provide is itself an Actually Portable Executable, so you can build your software for all six OSes from the comfort of whichever one you prefer most for development.

Lastly, the weights for your LLM can be embedded within your llamafile. We added support for PKZIP to the GGML library. This lets uncompressed weights be mapped directly into memory, similar to a self-extracting archive. It enables quantized weights distributed online to be prefixed with a compatible version of the llama.cpp software, thereby ensuring its originally observed behaviors can be reproduced indefinitely.

Binary Instructions​

We provide example binaries that embed several different models. You can download these from Hugging Face via the links below. "Command-line binaries" run from the command line, just as if you were invoking llama.cpp's "main" function manually. "Server binaries" launch a local web server (at 127.0.0.1:8080) that provides a web-based chatbot.




You can also also download just the llamafile software (without any weights included) from our releases page, or directly in your terminal or command prompt. This is mandatory currently on Windows.

Code:
curl -L https://github.com/Mozilla-Ocho/llamafile/releases/download/0.1/llamafile-server-0.1 >llamafile

chmod +x llamafile

./llamafile --help

./llamafile -m ~/weights/foo.gguf


Gotchas​

On macOS with Apple Silicon you need to have Xcode installed for llamafile to be able to bootstrap itself.

If you use zsh and have trouble running llamafile, try saying sh -c ./llamafile. This is due to a bug that was fixed in zsh 5.9+. The same is the case for Python subprocess, old versions of Fish, etc.

On Linux binfmt_misc has been known to cause problems. You can fix that by installing the actually portable executable interpreter.

Code:
sudo wget -O /usr/bin/ape https://cosmo.zip/pub/cosmos/bin/ape-$(uname -m).elf

sudo chmod +x /usr/bin/ape

sudo sh -c "echo ':APE:M::MZqFpD::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"

sudo sh -c "echo ':APE-jart:M::jartsr::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"

On Windows, you may need to rename llamafile to llamafile.exe in order for it to run. Windows also has a maximum file size limit of 4GB for executables. The LLaVA server executable above is just 30MB shy of that limit, so it'll work on Windows, but with larger models like WizardCoder 13B, you need to store the weights in a separate file. Here's an example of how to do that. Let's say you want to try Mistral. In that case you can open PowerShell and run these commands:

Code:
curl -o llamafile.exe https://github.com/Mozilla-Ocho/llamafile/releases/download/0.1/llamafile-server-0.1

curl -o mistral.gguf https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf

.\llamafile.exe -m mistral.gguf

On any platform, if your llamafile process is immediately killed, check if you have CrowdStrike and then ask to be whitelisted.


GPU Support​

On Apple Silicon, everything should just work if Xcode is installed.

On Linux, Nvidia cuBLAS GPU support will be compiled on the fly if (1) you have the cc compiler installed, (2) you pass the --n-gpu-layers 35 flag (or whatever value is appropriate) to enable GPU, and (3) the CUDA developer toolkit is installed on your machine and the nvcc compiler is on your path.

On Windows, that usually means you need to open up the MSVC x64 native command prompt and run llamafile there, for the first invocation, so it can build a DLL with native GPU support. After that, $CUDA_PATH/bin still usually needs to be on the $PATH so the GGML DLL can find its other CUDA dependencies.

In the event that GPU support couldn't be compiled and dynamically linked on the fly for any reason, llamafile will fall back to CPU inference.



 
Top