The A.I Megathread (LLM , GPT , Development)

bnew · Mar 8, 2024

Why people are falling in love with AI chatbots

From Tinder to Replika, generative AI is transforming human relationships.

www.theverge.com

Why people are falling in love with AI chatbots

From Tinder to Replika, generative AI is transforming how humans use dating apps and even spurring real people to romance AI chatbots.

By Nilay Patel, editor-in-chief of the Verge, host of the Decoder podcast, and co-host of The Vergecast.

Mar 7, 2024, 10:00 AM EST

13 Comments

Illustration: The Verge

Our Thursday episodes of Decoder are all about big topics in the news, and this week, we’re wrapping up our short series on one of the biggest topics of all: generative AI.

In our last couple of episodes, we’ve talked a lot about some of the biggest, most complicated legal and policy questions surrounding the modern AI industry, including copyright lawsuits and deepfake legislation. But we wanted to end on a smaller, more emotional note: How is AI making people feel? And in particular, how is it affecting how people communicate and connect?

Verge reporter Emilia David has covered AI chatbots, specifically AI romance bots, quite a bit. So, we invited her on the show to talk about how generative AI is finding its way into dating because the boom in AI chatbot sophistication is also laying the groundwork for a generation of people who might form meaningful relationships with so-called AI companions.

You’ll hear Emilia describe this as two big trends that are happening in parallel. Dating apps like Tinder are using generative AI to help their users tailor their profiles and craft the perfect messages to send to potential matches. And Emilia has covered a new startup, Volar, which offers an AI chatbot version of you, which can then go and message on your behalf with another person’s AI chatbot to see if you’re a match before you connect in real life. This is a wild idea, and it’s actually happening.

But the real sci-fi vision is an AI companion, like those from the company Replika, the origins of which we covered in an in-depth feature way back in 2016. Replika encourages people to treat chatbots as friends, therapists, and even romantic partners. You’ll hear Emilia explain that people are using these tools right now to form meaningful connections with AI, even if the actual tech under the hood is a ways away from what you might see in movies like Her from director Spike Jonze.

And finally, we touched on how there are both very real, tangible mental health benefits to this approach but also some serious pitfalls that people need to watch out for.

bnew · Mar 8, 2024

Google experiments with a tool to enable on-device AI.

Google’s new demo MediaPipe LLM Inference API lets developers run AI models on devices like laptops and phones that don’t have the same computing power as servers. Google says MediaPipe supports four models: Gemma, Phi 2, Falcon, and Stable LM. It can run on the web, Android, and iOS, but...

www.theverge.com

POSTED MAR 7, 2024

AT 1:04 PM EST

EMILIA DAVID

Google experiments with a tool to enable on-device AI.

Google’s new demo MediaPipe LLM Inference API lets developers run AI models on devices like laptops and phones that don’t have the same computing power as servers.

This new release enables Large Language Models (LLMs) to run fully on-device across platforms. This new capability is particularly transformative considering the memory and compute demands of LLMs, which are over a hundred times larger than traditional on-device models. Optimizations across the on-device stack make this possible, including new ops, quantization, caching, and weight sharing.

Google says MediaPipe supports four models: Gemma, Phi 2, Falcon, and Stable LM. It can run on the web, Android, and iOS, but Google plans to expand into more models and platforms this year.

Large Language Models On-Device with MediaPipe and TensorFlow Lite - Google for Developers

[DEVELOPERS.GOOGLEBLOG.COM]

bnew · Mar 8, 2024

bnew · Mar 8, 2024

bnew · Mar 8, 2024

Hugging Face is launching an open source robotics project led by former Tesla scientist

Remi Cadene's move signals an ambitious expansion for Hugging Face, which until now, has primarily focused on software, not hardware.

venturebeat.com

Hugging Face is launching an open source robotics project led by former Tesla scientist

Carl Franzen @carlfranzen

March 7, 2024 6:29 AM

Workers in yellow jumpsuits assemble large spherical yellow smiley face humanoid robots in a factory.

Credit: VentureBeat made with Midjourney Niji V6

Hugging Face, the New York City-startup that maintains the popular open source repository of machine learning and AI code of the same name and the open source ChatGPT-rival Hugging Chat, is launching a new robotics project under former Tesla staff scientist Remi Cadene, according to a post from Cadene on X this morning.

Fittingly, Cadene said the Hugging Face robot project would be “open-source, not as in Open AI,” in keeping with Hugging Face’s stated ethos and also a playful jab at OpenAI’s recent response to a lawsuit from co-founder turned rival, Tesla CEO Elon Musk (Cadene’s boss until recently).

Now hiring robotics engineers

He also said he was “looking for engineers” in Paris, France and posted a link to a job listing for an “Embodied Robotics Engineer,” which gives more clues, reading in part:

“At Hugging Face, we believe ML doesn’t have to be constrained to computers and servers, and that’s why we’re expanding our team with a new opportunity for a Robotics Engineer focusing on Machine Learning/AI.

In this role, you will be responsible for designing, building, and maintaining open-source and low cost robotic systems that integrate AI technologies, specifically in deep learning and embodied AI. You will collaborate closely with ML engineers, researchers, and product teams to develop innovative solutions that push the boundaries of what’s possible in robotics and AI.“

The listing also calls upon hires to “Design, build, and maintain open-source and low cost robotic systems integrating deep learning and embodied AI technologies” and “Build low cost robots with off the shelf electronic components and controllers and 3D printed parts.”

Ambitious expansion

The move signals a major departure and ambitious expansion for Hugging Face, which until now, has primarily focused on software, not hardware.

It comes as investment and interest in the humanoid robotics and general robotics space is heating up, with Tesla pursuing its own humanoid robot Optimus (which Cadene says he worked on as part of its Autopilot group, repurposing some of the work done to make its cars move autonomously), and a rival called Figure recently raising an eye-watering $675 million from OpenAI and others to pursue its own rival robots.

Research on robots has also accelerated markedly in recent months as engineers look to the generative AI boom for new tricks from large language models (LLMs) and machine learning (ML) programs on how to train robots more quickly, cheaply and accurately. There is a general and growing interest in the tech industry toward “embodied” AI that moves it off screens and devices into machines capable of autonomously navigating the world and assisting humans with non-software related, physically demanding or tedious tasks, including household chores, hard labor, manufacturing and more.

Cadene worked at Tesla for nearly three years, from April 2021 through March 2024, according to his LinkedIn profile.

We’ve reached out to confirm the news with Hugging Face and ask for further information on the project. We will update when we hear back.

bnew · Mar 8, 2024

Computer Science > Computation and Language

[Submitted on 7 Mar 2024]

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, Yu Su

Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has been trained. We find that existing LLMs, including GPT-4 and open-source LLMs specifically fine-tuned for tool use, only reach a correctness rate in the range of 30% to 60%, far from reliable use in practice. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE), that orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory. Specifically, STE leverages an LLM's 'imagination' to simulate plausible scenarios for using a tool, after which the LLM interacts with the tool to learn from its execution feedback. Both short-term and long-term memory are employed to improve the depth and breadth of the exploration, respectively. Comprehensive experiments on ToolBench show that STE substantially improves tool learning for LLMs under both in-context learning and fine-tuning settings, bringing a boost of 46.7% to Mistral-Instruct-7B and enabling it to outperform GPT-4. We also show effective continual learning of tools via a simple experience replay strategy.

Comments:	Code and data available at this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2403.04746 [cs.CL]
	(or arXiv:2403.04746v1 [cs.CL] for this version)
	[2403.04746] LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error Focus to learn more

Submission history

From: Boshi Wang [view email]
[v1] Thu, 7 Mar 2024 18:50:51 UTC (7,453 KB)

https://arxiv.org/pdf/2403.04746.pdf

GitHub - microsoft/simulated-trial-and-error

Contribute to microsoft/simulated-trial-and-error development by creating an account on GitHub.

github.com

bnew · Mar 8, 2024

here's a new AI music competitor called Sonauto and their model isn't even paywalled

Sonauto | New Music by You

An unlimited free AI music generator with lyrics. Turn any idea into a full song with our highest quality model. Share your music with the world.

sonauto.ai

bnew · Mar 8, 2024

bnew · Mar 8, 2024

bnew · Mar 8, 2024

GitHub - abi/screenshot-to-code: Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue) - abi/screenshot-to-code

github.com

This simple app converts a screenshot to code (HTML/Tailwind CSS, or React or Bootstrap or Vue). It uses GPT-4 Vision (or Claude 3) to generate the code and DALL-E 3 to generate similar-looking images. You can now also enter a URL to clone a live website.

bnew · Mar 8, 2024

Claude 3 beats GPT-4 on Aider’s code editing benchmark

Claude 3 Opus outperforms all of OpenAI’s models on Aider’s code editing benchmark, making it the best available model for pair programming with AI.

aider.chat

benchmark

Anthropic just released their new Claude 3 models with evals showing better performance on coding tasks. With that in mind, I’ve been benchmarking the new models using Aider’s code editing benchmark suite.

Claude 3 Opus outperforms all of OpenAI’s models, making it the best available model for pair programming with AI.

Aider currently supports Claude 3 Opus via OpenRouter:

# Install aider
pip install aider-chat

# Setup OpenRouter access
export OPENAI_API_KEY=<your-openrouter-key>
export OPENAI_API_BASE=https://openrouter.ai/api/v1

# Run aider with Claude 3 Opus using the diff editing format
aider --model anthropic/claude-3-opus --edit-format diff

Aider’s code editing benchmark

Aider is an open source command line chat tool that lets you pair program with AI on code in your local git repo.

Aider relies on a code editing benchmark to quantitatively evaluate how well an LLM can make changes to existing code. The benchmark uses aider to try and complete 133 Exercism Python coding exercises. For each exercise, Exercism provides a starting python file with stubs for the needed functions, a natural language description of the problem to solve and a test suite to evaluate whether the coder has correctly solved the problem.

The LLM gets two tries to solve each problem:

On the first try, it gets the initial stub code and the English description of the coding task. If the tests all pass, we are done.
If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.

Benchmark results

Claude 3 Opus

The new claude-3-opus-20240229 model got the highest score ever on this benchmark, completing 68.4% of the tasks with two tries.
Its single-try performance was comparable to the latest GPT-4 Turbo model gpt-4-0125-preview, at 54.1%.
While Opus got the highest score, it was only a few points higher than the GPT-4 Turbo results. Given the extra costs of Opus and the slower response times, it remains to be seen which is the most practical model for daily coding use.

Claude 3 Sonnet

The new claude-3-sonnet-20240229 model performed similarly to OpenAI’s GPT-3.5 Turbo models with an overall score of 54.9% and a first-try score of 43.6%.

Code editing

It’s highly desirable to have the LLM send back code edits as some form of diffs, rather than having it send back an updated copy of the entire source code.

Weaker models like GPT-3.5 are unable to use diffs, and are stuck sending back updated copies of entire source files. Aider uses more efficient search/replace blocks with the original GPT-4 and unified diffs with the newer GPT-4 Turbo models.

Claude 3 Opus works best with the search/replace blocks, allowing it to send back code changes efficiently. Unfortunately, the Sonnet model was only able to work reliably with whole files, which limits it to editing smaller source files and uses more tokens, money and time.

Other observations

There are a few other things worth noting:

Claude 3 Opus and Sonnet are both slower and more expensive than OpenAI’s models. You can get almost the same coding skill faster and cheaper with OpenAI’s models.
Claude 3 has a 2X larger context window than the latest GPT-4 Turbo, which may be an advantage when working with larger code bases.
The Claude models refused to perform a number of coding tasks and returned the error “Output blocked by content filtering policy”. They refused to code up the beer song program, which makes some sort of superficial sense. But they also refused to work in some larger open source code bases, for unclear reasons.
The Claude APIs seem somewhat unstable, returning HTTP 5xx errors of various sorts. Aider automatically recovers from these errors with exponential backoff retries, but it’s a sign that Anthropic made be struggling under surging demand.

bnew · Mar 8, 2024

Anthropic introduced the next generation of Claude: Claude 3 model family. It includes Opus, Sonnet and Haiku models. Opus is the most intelligent model, that outperforms GPT-4 and Gemini 1.0 Ultra on most of the common evaluation benchmarks. Haiku is the fastest, most compact model for near-instant responsiveness. The Claude 3 models have vision capabilities, offer a 200K context window capable of accepting inputs exceeding 1 million tokens, improved accuracy and fewer refusals [Details | Model Card].
Stability AI partnered with Tripo AI and released TripoSR, a fast 3D object reconstruction model that can generate high-quality 3D models from a single image in under a second. The model weights and source code are available under the MIT license, allowing commercialized use. [Details | GitHub | Hugging Face].
Answer.AI released a fully open source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs. It combines QLoRA with Meta’s FSDP, which shards large models across multiple GPUs [Details].
Inflection launched Inflection-2.5, an upgrade to their model powering Pi, Inflection’s empathetic and supportive companion chatbot. Inflection-2.5 approaches GPT-4’s performance, but used only 40% of the amount of compute for training. Pi is also now available on Apple Messages [Details].
Twelve Labs introduced Marengo-2.6, a new state-of-the-art (SOTA) multimodal foundation model capable of performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more [Details].
Cloudflare announced the development of Firewall for AI, a protection layer that can be deployed in front of Large Language Models (LLMs), hosted on the Cloudflare Workers AI platform or models hosted on any other third party infrastructure, to identify abuses before they reach the models [Details]
Scale AI, in partnership with the Center for AI Safety, released WMDP (Weapons of Mass Destruction Proxy): an open-source evaluation benchmark of 4,157 multiple-choice questions that serve as a proxy measurement of LLM’s risky knowledge in biosecurity, cybersecurity, and chemical security [Details].
Midjourney launched v6 turbo mode to generate images at 3.5x the speed (for 2x the cost). Just type /turbo [Link].
Moondream.ai released moondream 2 - a small 1.8B parameters, open-source, vision language model designed to run efficiently on edge devices. It was initialized using Phi-1.5 and SigLIP, and trained primarily on synthetic data generated by Mixtral. Code and weights are released under the Apache 2.0 license, which permits commercial use [Details].
Vercel released Vercel AI SDK 3.0. Developers can now associate LLM responses to streaming React Server Components [Details].
Nous Research released a new model designed exclusively to create instructions from raw-text corpuses, Genstruct 7B. This enables the creation of new, partially synthetic instruction finetuning datasets from any raw-text corpus [Details].
01.AI open-sources Yi-9B, one of the top performers among a range of similar-sized open-source models excelling in code, math, common-sense reasoning, and reading comprehension [Details].
Accenture to acquire Udacity to build a learning platform focused on AI [Details].
China Offers ‘Computing Vouchers’ upto $280,000 to Small AI Startups to train and run large language models [Details].
Snowflake and Mistral have partnered to make Mistral AI’s newest and most powerful model, Mistral Large, available in the Snowflake Data Cloud [Details]
OpenAI rolled out ‘Read Aloud’ feature for ChatGPT, enabling ChatGPT to read its answers out loud. Read Aloud can speak 37 languages but will auto-detect the language of the text it’s reading [Details].

Food Mane · Mar 8, 2024

I used gpt4 to analyze spreadsheets today. Complete game changer. I don't think I'll ever need to do pivot tables again. shyt is wild.

bnew · Mar 9, 2024

Food Mane said:
I used gpt4 to analyze spreadsheets today. Complete game changer. I don't think I'll ever need to do pivot tables again. shyt is wild.

how did you prompt it exactly?

bnew · Mar 9, 2024

This AI Can Design the Machinery of Life With Atomic Precision

The new AI will be offered publicly to scientists so they can create novel interacting bio-components that could lead to new therapies.

singularityhub.com

This AI Can Design the Machinery of Life With Atomic Precision

By Shelly Fan

March 8, 2024

Proteins are social creatures. They’re also chameleons. Depending on a cell’s needs, they rapidly transform in structure and grab onto other biomolecules in an intricate dance.

It’s not molecular dinner theater. Rather, these partnerships are the heart of biological processes. Some turn genes on or off. Others nudge aging “zombie” cells to self-destruct or keep our cognition and memory in tip-top shape by reshaping brain networks.

These connections have already inspired a wide range of therapies—and new therapies could be accelerated by AI that can model and design biomolecules. But previous AI tools solely focused on proteins and their interactions, casting their non-protein partners aside.

This week, a study in Science expanded AI’s ability to model a wide variety of other biomolecules that physically grab onto proteins, including the iron-containing small molecules that form the center of oxygen carriers.

Led by Dr. David Baker at the University of Washington, the new AI broadens the scope of biomolecular design. Dubbed RoseTTAFold All-Atom, it builds upon a previous protein-only system to incorporate a myriad of other biomolecules, such as DNA and RNA. It also adds small molecules—for example, iron—that are integral to certain protein functions.

The AI learned only from the sequence and structure of the components—without any idea of their 3D structure—but can map out complex molecular machines at the atomic level.

In the study, when paired with generative AI, RoseTTAFold All-Atom created proteins that easily grabbed onto a heart disease medication. The algorithm also generated proteins that regulate heme, an iron-rich molecule that helps blood carry oxygen, and bilin, a chemical in plants and bacteria that absorbs light for their metabolism.

These examples are just proofs of concept. The team is releasing RoseTTAFold All-Atom to the public for scientists so they can create multiple interacting bio-components with far more complexity than protein complexes alone. In turn, the creations could lead to new therapies.

“Our goal here was to build an AI tool that could generate more sophisticated therapies and other useful molecules,” said study author Woody Ahern in a press release.

Dream On

In 2020, Google DeepMind’s AlphaFold and Baker Lab’s RoseTTAFold solved the protein structure prediction problem that had baffled scientists for half a century and ushered in a new era of protein research. Updated versions of these algorithms mapped all protein structures both known and unknown to science.

Next, generative AI—the technology behind OpenAI’s ChatGPT and Google’s Gemini—sparked a creative frenzy of designer proteins with an impressive range of activity. Some newly generated proteins regulated a hormone that kept calcium levels in check. Others led to artificial enzymes or proteins that could readily change their shape like transistors in electronic circuits.

By hallucinating a new world of protein structures, generative AI has the potential to dream up a generation of synthetic proteins to regulate our biology and health.

But there’s a problem. Designer protein AI models have tunnel vision: They are too focused on proteins.

When envisioning life’s molecular components, proteins, DNA, and fatty acids come to mind. But inside a cell, these structures are often held together by small molecules that mesh with surrounding components, together forming a functional bio-assembly.

One example is heme, a ring-like molecule that incorporates iron. Heme is the basis of hemoglobin in red blood cells, which shuttles oxygen throughout the body and grabs onto surrounding protein “hooks” using a variety of chemical bonds.

Unlike proteins or DNA, which can be modeled as a string of molecular “letters,” small molecules and their interactions are hard to capture. But they’re critical to biology’s complex molecular machines and can dramatically alter their functions.

Which is why, in their new study, the researchers aimed to broaden AI’s scope beyond proteins.

“We set out to develop a structure prediction method capable of generating 3D coordinates for all atoms” for a biological molecule, including proteins, DNA, and other modifications, the authors wrote in their paper.

Tag Team

The team began by modifying a previous protein modeling AI to incorporate other molecules.

The AI works on three levels: The first analyzes a protein’s one-dimensional “letter” sequence, like words on a page. Next, a 2D map tracks how far each protein “word” is from another. Finally, 3D coordinates—a bit like GPS—map the overall structure of the protein.

Then comes the upgrade. To incorporate small molecule information into the model, the team added data about atomic sites and chemical connections into the first two layers.

In the third, they focused on chirality—that is, if a chemical’s structure is left or right-handed. Like our hands, chemicals can also have mirrored structures with vastly differing biological consequences. Like putting on gloves, only the correct “handedness” of a chemical can fit a given bio-assembly “glove.”

RoseTTAFold All-Atom was then trained on multiple datasets with hundreds of thousands of datapoints describing proteins, small molecules, and their interactions. Eventually, it learned general properties of small molecules useful for building plausible protein assemblies. As a sanity check, the team also added a “confidence gauge” to identify high-quality predictions—those that lead to stable and functional bio-assemblies.

Unlike previous protein-only AI models, RoseTTAFold All-Atom “can model full biomolecular systems,” wrote the team.

In a series of tests, the upgraded model outperformed previous methods when learning to “dock” small molecules onto a given protein—a key component of drug discovery—by rapidly predicting interactions between proteins and non-protein molecules.

Brave New World

Incorporating small molecules opens a whole new level of custom protein design.

As a proof of concept, the team meshed RoseTTAFold All-Atom with a generative AI model they had previously developed and designed protein partners for three different small molecules.

The first was digoxigenin, which is used to treat heart diseases but can have side effects. A protein that grabs onto it reduces toxicity. Even without prior knowledge of the molecule, the AI designed several protein binders that tempered digoxigenin levels when tested in cultured cells.

The AI also designed proteins that bind to heme, a small molecule critical for oxygen transfer in red blood cells, and bilin, which helps a variety of creatures absorb light.

Unlike previous methods, the team explained, the AI can “readily generate novel proteins” that grab onto small molecules without any expert knowledge.

It can also make highly accurate predictions about the strength of connections between proteins and small molecules at the atomic level, making it possible to rationally build a whole new universe of complex biomolecular structures.

“By empowering scientists everywhere to generate biomolecules with unprecedented precision, we’re opening the door to groundbreaking discoveries and practical applications that will shape the future of medicine, materials science, and beyond,” said Baker.

Image Credit: Ian C. Haydon

The A.I Megathread (LLM , GPT , Development)

Veteran

Why people are falling in love with AI chatbots​

From Tinder to Replika, generative AI is transforming how humans use dating apps and even spurring real people to romance AI chatbots.​

Veteran

Veteran

Veteran

Veteran

Hugging Face is launching an open source robotics project led by former Tesla scientist​

Now hiring robotics engineers​

Ambitious expansion​

Veteran

Computer Science > Computation and Language​

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error​

Submission history​

Veteran

Veteran

Veteran

Veteran

Veteran

benchmark​

Aider’s code editing benchmark​

Benchmark results​

Claude 3 Opus​

Claude 3 Sonnet​

Code editing​

Other observations​

Veteran

Superstar

Veteran

Veteran

This AI Can Design the Machinery of Life With Atomic Precision​

Dream On​

Tag Team​

Brave New World​

Why people are falling in love with AI chatbots

From Tinder to Replika, generative AI is transforming how humans use dating apps and even spurring real people to romance AI chatbots.

Hugging Face is launching an open source robotics project led by former Tesla scientist

Now hiring robotics engineers

Ambitious expansion

Computer Science > Computation and Language

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Submission history

benchmark

Aider’s code editing benchmark

Benchmark results

Claude 3 Opus

Claude 3 Sonnet

Code editing

Other observations

This AI Can Design the Machinery of Life With Atomic Precision

Dream On

Tag Team

Brave New World