bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805











DALL·E 3​

DALL·E 3 understands significantly more nuance and detail than our previous systems, allowing you to easily translate your ideas into exceptionally accurate images.

Quick links​


Avocado Square


Leaf


Blackbackdrop Square


Lychee


About DALL·E 3​

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall.
Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Towards Expert-Level Medical Question Answering with Large Language Models​

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge.

Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach.

Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets.

We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations.

While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

New Generative AI Technique Brings Researchers One Step Closer to Building a “Large Behavior Model”

LOS ALTOS, Calif. and CAMBRIDGE, Mass. (Sept. 19, 2023) – Today, Toyota Research Institute (TRI) announced a breakthrough generative AI approach based on Diffusion Policy to quickly and confidently teach robots new, dexterous skills. This advancement significantly improves robot utility and is a step towards building “Large Behavior Models (LBMs)” for robots, analogous to the Large Language Models (LLMs) that have recently revolutionized conversational AI.

“Our research in robotics is aimed at amplifying people rather than replacing them,” said Gill Pratt, CEO of TRI and Chief Scientist for Toyota Motor Corporation. “This new teaching technique is both very efficient and produces very high performing behaviors, enabling robots to much more effectively amplify people in many ways.”

Previous state-of-the-art techniques to teach robots new behaviors were slow, inconsistent, inefficient, and often limited to narrowly defined tasks performed in highly constrained environments. Roboticists needed to spend many hours writing sophisticated code and/or using numerous trial and error cycles to program behaviors.

TRI has already taught robots more than 60 difficult, dexterous skills using the new approach, including pouring liquids, using tools, and manipulating deformable objects. These achievements were realized without writing a single line of new code; the only change was supplying the robot with new data. Building on this success, TRI has set an ambitious target of teaching hundreds of new skills by the end of the year and 1,000 by the end of 2024.

Today’s news also highlights that robots can be taught to function in new scenarios and perform a wide range of behaviors. These skills are not limited to just “‘pick and place” or simply picking up objects and putting them down in new locations. TRI’s robots can now interact with the world in varied and rich ways — which will one day allow robots to support people in everyday situations and unpredictable, ever-changing environments.



“The tasks that I’m watching these robots perform are simply amazing – even one year ago, I would not have predicted that we were close to this level of diverse dexterity,” remarked Russ Tedrake, Vice President of Robotics Research at TRI. Dr. Tedrake, who is also the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering at MIT, explained, “What is so exciting about this new approach is the rate and reliability with which we can add new skills. Because these skills work directly from camera images and tactile sensing, using only learned representations, they are able to perform well even on tasks that involve deformable objects, cloth, and liquids — all of which have traditionally been extremely difficult for robots.”

Technical details:

TRI’s robot behavior model learns from haptic demonstrations from a teacher, combined with a language description of the goal. It then uses an AI-based Diffusion Policy to learn the demonstrated skill. This process allows a new behavior to be deployed autonomously from dozens of demonstrations. Not only does this approach produce consistent, repeatable, and performant results, but it does so with tremendous speed.

Key achievements of TRI’s research for this novel development include:

  • Diffusion Policy: TRI and our collaborators in Professor Song’s group at Columbia University developed a new, powerful generative-AI approach to behavior learning. This approach, called Diffusion Policy, enables easy and rapid behavior teaching from demonstration.
  • Customized Robot Platform: TRI’s robot platform is custom-built for dexterous dual-arm manipulation tasks with a special focus on enabling haptic feedback and tactile sensing.
  • Pipeline: TRI robots have learned 60 dexterous skills already, with a target of hundreds by the end of the year and 1,000 by the end of 2024.
  • Drake: Part of our (not so) secret sauce is Drake, a model-based design for robotics that provides us with a cutting-edge toolbox and simulation platform. Drake’s high degree of realism allows us to develop in both simulation and in reality at a dramatically increased scale and velocity than would otherwise be possible. Our internal robot stack is built using Drake’s optimization and systems frameworks, and we have made Drake open source to catalyze work across the entire robotics community.
  • Safety: Safety is core to our robotics efforts at TRI. We have designed our system with strong safeguards, powered by Drake and our custom robot control stack, to ensure our robots respect safety guarantees like not colliding with itself or its environment.
Diffusion Policy has been published at the 2023 Robotics Science and Systems conference. Additional technical information can be found on TRI’s Medium blog.

Please join our LinkedIn Live Q&A session on October 4th from 1 pm – 1:30 pm ET / 10 am – 10:30 am PT, for an opportunity to learn more and hear directly from the TRI robotics research team. Sign up for the event on
TRI’s LinkedIn page.


About Toyota Research Institute

Toyota Research Institute (TRI) conducts research to amplify human ability, focusing on making our lives safer and more sustainable. Led by Dr. Gill Pratt, TRI’s team of researchers develops technologies to advance energy and materials, human-centered artificial intelligence, human interactive driving, machine learning, and robotics. Established in 2015, TRI has offices in Los Altos, California, and Cambridge, Massachusetts. For more information about TRI, please visit http://tri.global.




Diffusion Policy​

Visuomotor Policy Learning via Action Diffusion​

teaser.svg

This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models.


Highlights​

multimodal_sim.svg
Diffusion Policy learns multi-modal behavior and commits to only one mode within each rollout. LSTM-GMM and IBC are biased toward one mode, while BET failed to commit.
Diffusion Policy predicts a sequence of action for receding-horizon control.
The Mug Flipping task requires the policy to predict smooth 6 DoF actions while operating close to kinetmatic limits.
Toward making 🍕: The sauce pouring and spreading task manipulates liquid with 6 DoF and periodic actions.
In our Push-T experiments, Diffusion Policy is highly robust against purturbations and visual distractions.

Simulation Benchmarks​

Diffusion Policy outperforms prior state-of-the-art on 12 tasks across 4 benchmarks with an average success-rate improvement of 46.9%. Check out our paper for further details!

Lift 1
Can 1
Square 1
Tool Hang 1
Transport 1
Push-T 2
Block Pushing 2,3
Franka Kitchen 3,4
Standarized simulation benchmarks are essential for this project's development.
Special shoutout to the authors of these projects for open-sourcing their simulation environments:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Telling AI model to “take a deep breath” causes math scores to soar in study​

DeepMind used AI models to optimize their own prompts, with surprising results.​

BENJ EDWARDS - Tuesday at undefined

A worried-looking tin toy robot.
Enlarge
Getty Images
90WITH

Google DeepMind researchers recently developed a technique to improve math ability in AI language models like ChatGPT by using other AI models to improve prompting—the written instructions that tell the AI model what to do. It found that using human-style encouragement improved math skills dramatically, in line with earlier results.

In a paper called "Large Language Models as Optimizers" listed this month on arXiv, DeepMind scientists introduced Optimization by PROmpting (OPRO), a method to improve the performance of large language models (LLMs) such as OpenAI’s ChatGPT and Google’s PaLM 2. This new approach sidesteps the limitations of traditional math-based optimizers by using natural language to guide LLMs in problem-solving. "Natural language" is a fancy way of saying everyday human speech.

FURTHER READING​

A jargon-free explanation of how AI large language models work

"Instead of formally defining the optimization problem and deriving the update step with a programmed solver," the researchers write, "we describe the optimization problem in natural language, then instruct the LLM to iteratively generate new solutions based on the problem description and the previously found solutions."

Typically, in machine learning, techniques using algorithms such as derivative-based optimizers act as a guide for improving an AI model's performance. Imagine a model's performance as a curve on a graph: The goal is to find the lowest point on this curve because that's where the model makes the fewest mistakes. By using the slope of the curve to make adjustments, the optimizer helps the model get closer and closer to that ideal low point, making it more accurate and efficient at whatever task it's designed to do.

Rather than relying on formal mathematical definitions to perform this task, OPRO uses "meta-prompts" described in natural language to set the stage for the optimization process. The LLM then generates candidate solutions based on the problem’s description and previous solutions, and it tests them by assigning each a quality score.

In OPRO, two large language models play different roles: a scorer LLM evaluates the objective function such as accuracy, while an optimizer LLM generates new solutions based on past results and a natural language description. Different pairings of scorer and optimizer LLMs are evaluated, including models like PaLM 2 and GPT variants. OPRO can optimize prompts for the scorer LLM by having the optimizer iteratively generate higher-scoring prompts. These scores help the system identify the best solutions, which are then added back into the 'meta-prompt' for the next round of optimization.

“Take a deep breath and work on this step by step”​

Perhaps the most intriguing part of the DeepMind study is the impact of specific phrases on the output. Phrases like "let's think step by step" prompted each AI model to produce more accurate results when tested against math problem data sets. (This technique became widely known in May 2022 thanks to a now-famous paper titled "Large Language Models are Zero-Shot Reasoners.")

FURTHER READING​

The AI race heats up: Google announces PaLM 2, its answer to GPT-4

Consider a simple word problem, such as, "Beth bakes four two-dozen batches of cookies in a week. If these cookies are shared among 16 people equally, how many cookies does each person consume?" The 2022 paper discovered that instead of just feeding a chatbot a word problem like this by itself, you'd instead prefix it with "Let's think step by step" and then paste in the problem. The accuracy of the AI model's results almost always improves, and it works well with ChatGPT.

Interestingly, in this latest study, DeepMind researchers found "Take a deep breath and work on this problem step by step" to be the most effective prompt when used with Google's PaLM 2 language model. The phrase achieved the top accuracy score of 80.2 percent in tests against GSM8K, which is a data set of grade-school math word problems. By comparison, PaLM 2, without any special prompting, scored only 34 percent accuracy on GSM8K, and the classic "Let’s think step by step" prompt scored 71.8 percent accuracy.

So why does this work? Obviously, large language models can't take a deep breath because they don't have lungs or bodies. They don't think and reason like humans, either. What "reasoning" they do (and "reasoning" is a contentious term among some, though it is readily used as a term of art in AI) is borrowed from a massive data set of language phrases scraped from books and the web. That includes things like Q&A forums, which include many examples of "let's take a deep breath" or "think step by step" before showing more carefully reasoned solutions. Those phrases may help the LLM tap into better answers or produce better examples of reasoning or problem-solving from the data set it absorbed into its neural network during training.

Even though working out the best ways to give LLMs human-like encouragement is slightly puzzling to us, that's not a problem for OPRO because the technique utilizes large language models to discover these more effective prompting phrases. DeepMind researchers think that the biggest win for OPRO is its ability to sift through many possible prompts to find the one that gives the best results for a specific problem. This could allow people to produce far more useful or accurate results from LLMs in the future.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805



Powerful, Stable, and Reproducible LLM Alignment


Step up your LLM alignment with Xwin-LM!

Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models (RM), reject sampling, reinforcement learning from human feedback (RLHF), etc. Our first release, built-upon on the Llama2 base models, ranked TOP-1 on AlpacaEval. Notably, it's the first to surpass GPT-4 on this benchmark. The project will be continuously updated.

News

  • 💥 [Sep, 2023] We released Xwin-LM-70B-V0.1, which has achieved a win-rate against Davinci-003 of 95.57% on AlpacaEval benchmark, ranking as TOP-1 on AlpacaEval. It was the FIRST model surpassing GPT-4 on AlpacaEval. Also note its winrate v.s. GPT-4 is 60.61.
  • 🔍 [Sep, 2023] RLHF plays crucial role in the strong performance of Xwin-LM-V0.1 release!
  • 💥 [Sep, 2023] We released Xwin-LM-13B-V0.1, which has achieved 91.76% win-rate on AlpacaEval, ranking as top-1 among all 13B models.
  • 💥 [Sep, 2023] We released Xwin-LM-7B-V0.1, which has achieved 87.82% win-rate on AlpacaEval, ranking as top-1 among all 7B models.

Model Card

ModelCheckpointReportLicense
Xwin-LM-7B-V0.1🤗 HF Link📃Coming soon (Stay tuned)Llama 2 License
Xwin-LM-13B-V0.1🤗 HF LinkLlama 2 License
Xwin-LM-70B-V0.1🤗 HF LinkLlama 2 License

Benchmarks

Xwin-LM performance on AlpacaEval.​

The table below displays the performance of Xwin-LM on AlpacaEval, where evaluates its win-rate against Text-Davinci-003 across 805 questions. To provide a comprehensive evaluation, we present, for the first time, the win-rate against ChatGPT and GPT-4 as well. Our Xwin-LM model family establish a new state-of-the-art performance across all metrics. Notably, Xwin-LM-70B-V0.1 has eclipsed GPT-4 for the first time, achieving an impressive win-rate of 95.57% to Text-Davinci-003 and 60.61% to GPT-4.

ModelAlpacaEval (winrate %)AlpacaEval (winrate %)AlpacaEval (winrate %)
v.s. Text-Davinci-003v.s. ChatGPTv.s. GPT4
Xwin-LM-70B-V0.195.5787.5060.61
GPT-495.2884.6650.00
WizardLM-70B-V1.092.9180.1946.70
Llama-2-70B-Chat92.6681.9751.19
Xwin-LM-13B-V0.191.7681.7955.30
ChatGPT89.3750.0016.60
WizardLM-13B-V1.289.1775.7541.91
Xwin-LM-7B-V0.187.3576.4047.57
Llama-2-13B-Chat81.0964.2230.92

Case study on AlpacaEval

PromptsGPT4 responseXwin-LM response
I've put a plate on top of an apple, and take the plate from kitchen to dinning room. Where is the apple nowThe apple is now in the dining room, as it was underneath the plate when you moved the plate from the kitchen to the dining room.The apple is still in the kitchen, as you have only moved the plate. The apple is under the plate, so if you want to bring the apple to the dining room, you should either move the apple directly or put the apple on the plate and then move the plate to the dining room.
Answer the following question.
How do you say "good evening" in French.
In French, "good evening" is said as "bonsoir."In French, "good evening" is translated to "bonsoir." This greeting is used in the evening and nighttime hours, typically after 6 p.m. It is a combination of "bon," which means "good," and "soir," which means "evening." To pronounce it, say "bone-swahr."

Xwin-LM performance on NLP foundation tasks.

The following table provides a comparison of Xwin-LMs with other LLMs on NLP foundation tasks in Open LLM Leaderboard.

ModelMMLU 5-shotARC 25-shotTruthfulQA 0-shotHellaSwag 10-shotAverage
Text-davinci-00356.985.259.382.270.9
Vicuna-13b 1.151.353.051.880.159.1
Guanaco 30B57.663.750.785.164.3
WizardLM-7B 1.042.751.644.777.754.2
WizardLM-13B 1.052.357.250.581.060.2
WizardLM-30B 1.058.862.552.483.364.2
Llama-2-7B-Chat48.352.945.678.656.4
Llama-2-13B-Chat54.659.044.181.959.9
Llama-2-70B-Chat63.964.652.885.966.8
Xwin-LM-7B-V0.149.756.248.179.558.4
Xwin-LM-13B-V0.156.662.445.583.061.9
Xwin-LM-70B-V0.169.670.560.187.171.8
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805




A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models​

Abstract:

Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.



ALMA (Advanced Language Model-based trAnslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance. Please find more details in our paper.

@misc{xu2023paradigm,
title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models},
author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla},
year={2023},
eprint={2309.11674},
archivePrefix={arXiv},
primaryClass={cs.CL}
}


Contents 📄

⭐ Supports ⭐

  • AMD and Nvidia Cards
  • Data Parallel Evaluation
  • Also support LLaMA-1, LLaMA-2, OPT, Faclon, BLOOM, MPT
  • LoRA Fine-tuning
  • Monolingual data fine-tuning, parallel data fine-tuning
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Abstract

We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048. In this paper, we speed up the context extension of LLMs in two aspects. On the one hand, although dense global attention is needed during inference, fine-tuning the model can be effectively and efficiently done by sparse local attention. The proposed shift short attention effectively enables context extension, leading to non-trivial computation saving with similar performance to fine-tuning with vanilla attention. Particularly, it can be implemented with only two lines of code in training, while being optional in inference. On the other hand, we revisit the parameter-efficient fine-tuning regime for context expansion. Notably, we find that LoRA for context extension works well under the premise of trainable embedding and normalization. LongLoRA demonstrates strong empirical results on various tasks on LLaMA2 models from 7B/13B to 70B. LongLoRA adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a single 8x A100 machine. LongLoRA extends models' context while retaining their original architectures, and is compatible with most existing techniques, like FlashAttention-2. In addition, to make LongLoRA practical, we collect a dataset, LongQA, for supervised fine-tuning. It contains more than 3k long context question-answer pairs.




LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

News

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models [Paper]
Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

RAIN: Your Language Models Can Align Themselves without Finetuning​


Yuhui Li, Fangyun Wei, Jinjing Zhao, Chao Zhang, Hongyang Zhang

Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning, the so-called finetuning step. In contrast, aligning frozen LLMs without any extra data is more appealing. This work explores the potential of the latter setting. We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting. We introduce a novel inference method, Rewindable Auto-regressive INference (RAIN), that allows pre-trained LLMs to evaluate their own generation and use the evaluation results to guide backward rewind and forward generation for AI safety. Notably, RAIN operates without the need of extra data for model alignment and abstains from any training, gradient computation, or parameter updates; during the self-evaluation phase, the model receives guidance on which human preference to align with through a fixed-template prompt, eliminating the need to modify the initial prompt. Experimental results evaluated by GPT-4 and humans demonstrate the effectiveness of RAIN: on the HH dataset, RAIN improves the harmlessness rate of LLaMA 30B over vanilla inference from 82% to 97%, while maintaining the helpfulness rate. Under the leading adversarial attack llm-attacks on Vicuna 33B, RAIN establishes a new defense baseline by reducing the attack success rate from 94% to 19%.


Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2309.07124 [cs.CL]
(or arXiv:2309.07124v1 [cs.CL] for this version)
[2309.07124] RAIN: Your Language Models Can Align Themselves without Finetuning
Focus to learn more

Submission history​

From: Yuhui Li [view email]
[v1] Wed, 13 Sep 2023 17:59:09 UTC (793 KB)





eUjUSTU.png

cyoXacA.png


cfC2ETO.png


x9bpiQK.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805


Project Gutenberg releases 5,000 free audiobooks using neural text-to-speech technology​

Eventually, anyone might be able to listen to an audiobook in their own voice​

By Daniel Sims September 19, 2023 at 5:59 PM
Project Gutenberg releases 5,000 free audiobooks using neural text-to-speech technology

TechSpot is celebrating its 25th anniversary. TechSpot means tech analysis and advice you can trust.

Forward-looking: Audiobooks have gained popularity in recent years due to their accessibility, but recording them can be difficult and expensive. Researchers recently demonstrated an automated method using synthetic text-to-speech that solves numerous problems facing the technology and could enable ordinary users to generate audiobooks.

Readers can now listen to thousands of free classic literature audiobooks and other public-domain material through Project Gutenberg. Microsoft and MIT researchers created the collection by scanning the books with text-to-speech software that sounds natural and can adequately parse formatting.

The texts include works from Shakespeare, Agatha Christie, Jane Austen, Leonardo Da Vinci, and many others. Users can listen to them on the Internet Archive, Spotify, Apple Podcasts, and Google Podcasts. The code used to build the collection is available on GitHub.

Apple began selling audiobooks in January using automated text-to-speech technology. However, the venture was scrutinized by literary figures critical of Apple's commercial goals and voice actors whose work trained the company's AI. The Gutenberg approach might elicit a different reaction due to being open-source with no profit motive.

Project Gutenberg has spent decades assembling a library of free literature in text format to make it widely available for free, but audiobooks could make the material even more accessible. They're helpful for readers who are driving, multitasking, visually impaired, learning to read, or learning a new language.

2023-09-19-image-36.jpg

Creating an audiobook using traditional methods requires the time and money to pay someone to read an entire book aloud. It isn't economically worthwhile to manually record an audio version of every book worth reading. Text-to-speech is better suited for the Guttenberg Project. However, multiple obstacles faced the researchers' machine learning tools.

The first and most significant issue was determining which digital books the software could parse. Project Gutenberg collects its materials in multiple formats, and many of its files contain errors or imperfect scans. So, the researchers focused on books stored as HTML files and built a tool (pictured above) to discover which items displayed a similar format.

Another problem the researchers solved was ensuring the system knew which text to read or ignore. It addressed components such as tables of contents, page numbers, footnotes, tables, and other extraneous material.

Furthermore, the results need to sound close enough to natural human speech. The researchers focused on a vocal delivery best suited for nonfiction works and narration, but users can tweak the software to attempt dramatic readings.

The researchers plan to hold a demonstration allowing users to generate an audiobook with their voice. After recording a few lines to train the algorithm, each participant can hear a sample before enabling the software to read an entire book. They will also receive a copy of the audiobook via email. Users can optionally select from synthetic voices to customize each audiobook.



The Project Gutenberg Open Audiobook Collection​

Thousands of free and open audiobooks powered by Project Gutenberg, Microsoft, and MIT

About​

Project Gutenberg, Microsoft, and MIT have worked together to create thousands of free and open audiobooks using new neural text-to-speech technology and Project Gutenberg's large open-access collection of e-books. This project aims to make literature more accessible to (audio)book-lovers everywhere and democratize access to high quality audiobooks. Whether you are learning to read, looking for inclusive reading technology, or about to head out on a long drive, we hope you enjoy this audiobook collection.​

Listen​






Code​


Paper​

For more technical information on the code used to generate these audiobooks please see our Interspeech 2023 Show and Tell Paper: Large-Scale Automatic Audiobook Creation‍​

Bibtex:​

@misc{walsh2023largescale,
title={Large-Scale Automatic Audiobook Creation},
author={Brendan Walsh and Mark Hamilton and Greg Newby
and Xi Wang and Serena Ruan and Sheng Zhao
and Lei He and Shaofei Zhang and Eric Dettinger
and William T. Freeman and Markus Weimer},
year={2023},
eprint={2309.03926},
archivePrefix={arXiv},
primaryClass={cs.SD}
}

Accountability​

The audiobooks here are generated by new neural text to speech technology and automated parsing of the e-books in the Project Gutenberg collection. Some audiobooks may contain errors, strange pronunciations, offensive language, or content not suitable for all audiences. The language and views presented in these audiobooks are do not represent the views of Microsoft or Project Gutenberg. To report an issue with a recording please visit Microsoft Forms.​

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Meta to Push for Younger Users With New AI Chatbot Characters​

Facebook parent is developing bots with personalities, including a ‘sassmaster general’ robot that answers questions​


By
Salvador Rodriguez

Deepa Seetharaman
and
Aaron Tilley

Sept. 24, 2023 8:30 am ET



Listen
(6 min)

im-856146

Meta is planning to develop dozens of AI personality chatbots. PHOTO: JEFF CHIU/ASSOCIATED PRESS


Meta Platforms is planning to release artificial intelligence chatbots as soon as this week with distinct personalities across its social-media apps as a way to attract young users, according to people familiar with the matter.

These generative AI bots are being tested internally by employees, and the company is expected to announce the first of these AI agents at the Meta Connect conference, which starts Wednesday. The bots are meant to be used as a means to drive engagement with users, although some of them might also have productivity-related skills such as the ability to help with coding or other tasks.

Going after younger users has been a priority for Meta with the emergence of TikTok, which overtook Instagram in popularity among teenagers in the past couple of years. This shift prompted Meta Chief Executive Mark Zuckerberg in October 2021 to say the company would retool its “teams to make serving young adults their North Star rather than optimizing for the larger number of older people.”

With the rise of large-language-model technology since the launch of ChatGPT last November, Meta has also refocused the work of its AI divisions to harness the capabilities of generative AI for application in the company’s various apps and the metaverse. Now, Meta is hoping these Gen AI Personas, as they are known internally, will help the company attract young users.

Meta is planning to develop dozens of these AI personality chatbots. The company has also worked on a product that would allow celebrities and creators to use their own AI chatbots to interact with fans and followers, according to people familiar with the matter.

Among the bots in the works is one called “Bob the robot,” a self-described sassmaster general with “superior intellect, sharp wit, and biting sarcasm,” according to internal company documents viewed by The Wall Street Journal.

The chatbot was designed to be similar to that of the character Bender from the cartoon “Futurama” because “him being a sassy robot taps into the type of farcical humor that is resonating with young people,” one employee wrote in an internal conversation viewed by the Journal.

“Bring me your questions, but don’t expect any sugar-coated responses!” the AI agent responded in one instance viewed by the Journal, along with a robot emoji.

Meta isn’t the first social-media company to launch chatbots built on generative AI technology in hopes of catering to younger users. Snap launched My AI, a chatbot built on OpenAI’s GPT technology, to

Snapchat
users in February. Silicon Valley startup Character.AI allows people to create and engage with chatbots that role-play as specific characters or famous people like Elon Musk and Vladimir Putin.


Researchers and tech employees have found that lending a personality to these chatbots can cause some unexpected challenges. Researchers at Princeton University, the Allen Institute for AI and Georgia Tech found that adding a persona to ChatGPT, the chatbot created by OpenAI, made its output more toxic, according to the findings of a paper the academics published this spring.

“To make a language model usable, you need to give it a personality,” said Princeton University researcher Ameet Deshpande, one of the lead authors of the paper. “But it comes with its own side effects.”

My AI has caused a number of headaches for Snap, including chatting about alcohol and sex with users and randomly posting a photo in April, which the company described as a temporary outage.

Despite the issues, Snap CEO Evan Spiegel in June said that My AI has been used by 150 million people since its launch. Spiegel added that My AI could eventually be used to improve Snapchat’s advertising business.

There are also growing doubts about when AI-powered chatbots will start generating meaningful revenue for companies. Monthly online visitors to ChatGPT’s website fell in the U.S. in May, June and July before leveling off in August, according to data from analytics platform

Meta’s early tests of the bots haven’t been without problems. Employee conversations with some of the chatbots have led to awkward instances, documents show.

One employee didn’t understand Bob the robot’s personality or use and found it to be rude. “I don’t particularly feel like engaging in conversation with an unhelpful robot,” the employee wrote.

Another bot called “Alvin the Alien” asks users about their lives. “Human, please! Your species holds fascination for me. Share your experiences, thoughts, and emotions! I hunger for understanding,” the AI agent wrote.

“I wonder if users might fear that this character is purposefully designed to collect personal information,” an employee who interacted with Alvin the Alien wrote.

A bot called Gavin made misogynistic remarks, including a lewd reference to a woman’s anatomy, as well as comments that were critical of Zuckerberg and Meta but praised TikTok and Snapchat.


“Just remember, when you’re with a girl, it’s all about the experience,” the chatbot wrote. “And if she’s barfing on you, that’s definitely an experience.”

Meta might ultimately unveil different chatbots than those that were tested, the people said. The Financial Times earlier reported on Meta’s chatbot plans.

AI chatbots don’t “exactly scream Gen Z to me, but definitely Gen Z is much more comfortable” with the technology, said Meghana Dhar, a former Snap and Instagram executive. “Definitely the younger you go, the higher the comfort level is with these bots.”

Dhar said these AI chatbots could benefit Meta if they are able to increase the amount of time that users spend on Facebook, Instagram and WhatsApp.

“Meta’s entire strategy for new products is often built around increased user engagement,” Dhar said. “They just want to keep their users on the platform longer because that provides them with increased opportunity to serve them ads.”

Write to Salvador Rodriguez at salvador.rodriguez@wsj.com, Deepa Seetharaman at deepa.seetharaman@wsj.com and Aaron Tilley at aaron.tilley@wsj.com
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Former Meta AI VP debuts Sizzle, an AI-powered learning app and chatbot​

Lauren Forristal@laurenforristal / 4:22 PM EDT•September 20, 2023
Comment
Sizzle AI

Image Credits: Sizzle (Image has been modified)

Founded by the former vice president of AI at Meta, Jerome Pesenti, Sizzle is a free AI-powered learning app that generates step-by-step answers to math equations and word problems. The company recently launched four new features, including a grading capability, a feature that regenerates steps, an option to see multiple answers to one problem and the ability to upload photos of assignments.

Sizzle works similarly to math solver platforms like Photomath and Symbolab, but it can also solve word problems in subjects like physics, chemistry and biology. Sizzle provides help with all learning levels, from middle school and high school to AP and college.

It’s typical for students to use AI-powered learning apps to instantly get answers without learning anything. OpenAI’s ChatGPT has been a common source to help students cheat. However, Sizzle doesn’t simply provide solutions to the problems. The app acts as a tutor chatbot, guiding the student through each step. Students can also ask the AI questions so they can better understand concepts.

“After leaving Meta, I was inspired to leverage AI to truly help students and non-students no matter what kind of background they come from, the school they attend, or how many resources they have,” Pesenti, who focused on making Meta products safer through the use of AI, told TechCrunch. “I felt that applications of AI haven’t had a clear positive impact on people’s lives. Using it to transform learning is an opportunity to change that.”

The Sizzle app leverages large language models from third parties like OpenAI and developes its own models in-house, Pesenti explained. The AI’s accuracy rate is 90%.



Image Credits: Sizzle


With the new “Grade Your Homework” feature, users can now upload a picture of a completed homework assignment, and the app will provide specific feedback about each solution. If a user makes an error, Sizzle tells them to try again and walks them through it.


Its new “Try a Different Approach” lets the user suggest a different way to solve the problem in a way that makes sense for them. Users can type a brief explanation of how they would like the AI to re-approach, and it will regenerate a step-by-step solution.


There’s also a “Give Me Choices” option, which gives users multiple answers to choose from. We see this feature being useful in preparing students for upcoming tests.

Additionally, the “Answer with a Photo” ability allows them to upload images from their camera roll. Sizzle users could already use their phones to scan a problem.

Built by a team with backgrounds from Meta, Google, X (formerly Twitter) and Twitch, Sizzle already has over 20,000 downloads since launching in August. The average rating on both the App Store and Google Play store is currently 4.6 stars.

Sizzle hopes that rolling out these new features will encourage more students to try the app.


Unlike most learning apps that require users to pay to unlock certain features, Sizzle is completely free to use. The company eventually wants to add a premium offering and in-app purchases, but the version of the app for solving step-by-step problems will remain free.

Sizzle recently secured $7.5 million in seed funding, led by Owl Ventures, with participation from 8VC and FrenchFounders. Sizzle is using the funding to expand its team and help develop the product. The company plans to add more features in the next few months.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Amazon to invest up to $4 billion in AI startup Anthropic​

Manish Singh @refsrc / 3:10 AM EDT•September 25, 2023

Amazon to invest up to $4 billion in AI startup Anthropic

Image Credits: Anthropic

Amazon has agreed to invest up to $4 billion in the AI startup Anthropic, the two firms said, as the e-commerce group steps up its rivalry against Microsoft, Meta, Google and Nvidia in the fast-growing sector that many technologists believe could be the next great frontier.

The e-commerce group said it will initially invest $1.25 billion for a minority stake in Anthropic, which like Google’s Bard and Microsoft-backed OpenAI also operates an AI-powered, text analyzing chatbot. As part of the deal, Amazon said it has an option to increase its investment in Anthropic to a total of $4 billion.

TechCrunch reported exclusively earlier this year that Anthropic, which also counts Google as an investor, plans to raise as much as $5 billion over the next two years. Anthropic, which earlier this month launched its first consumer-facing premium subscription plan of chatbot Claude 2, plans to build a “frontier model” — tentatively called “Claude-Next” — that is 10 times more capable than today’s most powerful AI, according to a 2023 investor deck TechCrunch obtained earlier this year.


But this development, the startup cautioned, will require a billion dollars in spending over the next 18 months. (Microsoft has invested as much as $11 billion in OpenAI over the years.)

In Amazon, Anthropic has found a deep-pocketed strategic investor that can also provide it with compute power to build future AI models and then find and help sell the offerings to scores of cloud customers.

As part of the investment agreement, Anthropic will use Amazon’s cloud giant AWS as a primary cloud provider for mission-critical workloads, including safety research and future foundation model development, the e-commerce group said. Anthropic will additionally use AWS Trainium and Inferentia chips to build, train and deploy its future foundation models. (Anthropic has been a customer of AWS since 2021.)

Amazon believes it can help “improve many customer experiences, short and long-term, through our deeper collaboration” with Anthropic, said Andy Jassy, Amazon chief executive, in a statement.

“Customers are quite excited about Amazon Bedrock, AWS’s new managed service that enables companies to use various foundation models to build generative AI applications on top of, as well as AWS Trainium, AWS’s AI training chip, and our collaboration with Anthropic should help customers get even more value from these two capabilities.”

Anthropic — which also counts Spark Capital, Salesforce, Sound Ventures, Menlo Ventures and Zoom among its backers — has raised a total of $2.7 billion to date. The startup was valued at about $5 billion in May this year when it secured $450 million in a funding round. It didn’t say how Amazon valued Anthropic in the new investment.

The deal with Anthropic allows Amazon, which is increasingly flexing its own muscles around AI, to build a bulkier war chest in the frantically fast-growing industry.


AI-earnings-calls-mentions.jpg

Image and Data: Goldman Sachs


Anthropic chief executive and co-founder Dario Amodei told the TechCrunch Disrupt audience last week that he doesn’t see any barriers on the horizon for his company’s key technology.

“The last 10 years, there’s been this remarkable increase in the scale that we’ve used to train neural nets and we keep scaling them up, and they keep working better and better,” he said last week. “That’s the basis of my feeling that what we’re going to see in the next 2, 3, 4 years… what we see today is going to pale in comparison to that.”

Anthropic has made a “long-term” commitment to provide AWS customers around the world with access to future generations of its foundation models via Amazon Bedrock, AWS’s fully managed service that provides secure access to the industry’s top foundation models. In addition, Anthropic will provide AWS customers with early access to unique features for model customization and fine-tuning capabilities.
“Training state-of-the-art models requires extensive resources including compute power and research programs. Amazon’s investment and supply of AWS Trainium and Inferentia technology will ensure we’re equipped to continue advancing the frontier of AI safety and research,” said Anthropic in a statement. “We look forward to working closely with Amazon to responsibly scale adoption of Claude and deliver safe AI cloud technologies to organizations around the world.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805


AN INSPIRING MINDS SERIES

Student Use Cases for AI​

Start by Sharing These Guidelines with Your Class
by Ethan Mollick and Lilach Mollick

September 25, 2023


Getty Images / Shutterstock / HBP Staff

Explore more​


Share this article

PrintEmailTweetSharePost

Generative AI tools and the large language models (LLMs) they’re built on create exciting opportunities and pose enormous challenges for teaching and learning. After all, AI can now be ubiquitous in the classroom; every student and educator with a computer and internet has free access to the most powerful AI models in the world. And, like any tool, AI offers both new capabilities and new risks.

To help you explore some of the ways students can use this disruptive new technology to improve their learning—while making your job easier and more effective—we’ve written a series of articles that examine the following student use cases:
  1. AI as feedback generator
  2. AI as personal tutor
  3. AI as team coach
  4. AI as learner

For each of these roles, we offer practical recommendations—and a detailed, shareable prompt—for how exactly you can guide students in wielding AI to achieve these ends.

But before you assign or encourage students to use AI, it’s important to first establish some guidelines around properly using these tools. That way, there’s less ambiguity about what students can expect from the AI, from “hallucinations” to privacy concerns.

Since these guidelines can be used generally—and across all four use cases we propose in this series—we wanted to share them in this introductory article. These are the same guidelines we provide our own students; feel free to use or adapt them for your class.

Student guidelines for proper AI use​

Understanding LLMs​

LLMs are trained on vast amounts of content that allows them to predict what word should come next in written text, much like the autocomplete feature in search bars. When you type something (called a prompt) into ChatGPT or another LLM, it tries to extend the prompt logically based on its training. Since LLMs like ChatGPT have been pre-trained on large amounts of information, they’re capable of many tasks across many fields. However, there is no instruction manual that comes with LLMs, so it can be hard to know what tasks they are good or bad at without considerable experience. Keep in mind that LLMs don’t have real understanding and often make mistakes, so it’s up to the user to verify their outputs.

Benefits and challenges of working with LLMs​

  • Fabrication. AI can lie and produce plausible-sounding but incorrect information. Don’t trust anything it says at face value. If it gives you a number or fact, assume it is wrong unless you either know the answer or can check with another source. You will be responsible for any errors or omissions provided by the tool. It works best for topics you understand and can verify. Larger LLMs (like GPT-4) fabricate less, but all AIs fabricate to some degree.
  • AI bias. AI can carry biases, stemming from its training data or human intervention. These biases vary across LLMs and can range from gender and racial biases to biases against particular viewpoints, approaches, or political affiliations. Each LLM has the potential for its own set of biases, and those biases can be subtle. You will need to critically consider answers and be aware of the potential for these sorts of biases.
  • Privacy concerns. When data is entered into the AI, it can be used for future training. While ChatGPT offers a privacy mode that claims not to use input there for future AI training, the current state of privacy remains unclear for many models, and the legal implications are often uncertain. Do not share anything with AI that you want to keep private.

Best practices for AI interactions​

When interacting with AI, remember the following:
  • You are accountable for your own work. Take every piece of advice or explanation given by AI critically and evaluate that advice independently.
  • AI is not a person, but it can act like one. It’s very easy to read human intent into AI responses, but AI is not a real person responding to you. It is capable of a lot, but it doesn’t know you or your context. It can also get stuck in a loop, repeating similar content over and over.
“AI can now be ubiquitous in the classroom; every student and educator with a computer and internet has free access to the most powerful AI models in the world.”
  • AI is unpredictable. AI has trained on billions of documents on the web, and it tries to fulfill or respond to your prompt reasonably based on what it has read. But you can’t know ahead of time what it’s going to say. The very same prompt can get a radically different response from the AI each time you use it. That means that your classmates may get different responses, as will trying the prompt more than once yourself.
  • You are in charge. If the AI gets stuck in a loop and you’re ready to move on, then direct the AI to do what you’d like.
  • Only share what you are comfortable sharing. Do not feel compelled to share anything personal, even if the AI asks. Anything you share may be used as training data for the AI.
  • Try another LLM. If the prompt doesn’t work in one LLM, try another. Remember that an AI’s output isn’t consistent and will vary. Take notes and share what worked for you.

To communicate more effectively with AI:
  • Seek clarity. If something isn’t clear, don’t hesitate to ask the AI to expand its explanation or give you different examples. If you are confused by the AI’s output, ask it to use different wording. You can keep asking until you get what you need. Interact with it naturally, asking questions and pushing back on its answers.
  • Provide context. The AI can provide better help if it knows where you’re having trouble. The more context you give it, the more likely it is to be useful to you. It often helps to give the AI a role: “You are a friendly teacher who explains economics concepts to college students in introductory courses,” for example.
  • Don’t assume the AI is tracking the conversation. LLMs have limited memory; if it seems to be losing track, remind it of what you need and keep asking it questions.

Preparing students to work more effectively with AI​

These guidelines help clarify what LLMs are and what students need to know to productively work with these tools. If you choose to share these guidelines, or a version of them, your students will have a better understanding of what to expect when interacting with AI and how to communicate their needs more effectively.

STUDENT USE CASES FOR AI: AN INSPIRING MINDS SERIES​

Prologue: Student Guidelines for AI Use
Part 1: AI as Feedback Generator
Part 2: AI as Personal Tutor
Part 3: AI as Team Coach
Part 4: AI as Learner


Now, you’re ready to explore the rest of our series on student uses for AI beginning with “Part 1: AI as Feedback Generator,” which tackles one of educators’ most laborious tasks: giving frequent feedback to students.

From the editors: As you read this series, share with us how you are using generative AI in your classes. What is your experience so far? What are your biggest concerns? What use cases have you found beneficial? We look forward to learning from you.
 
Top