bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848

Fact or (Science) Fiction?

Some scientists view AGI as unattainable science fiction. But Max Riesenhuber, co-director at the Center for Neuroengineering at Georgetown University Medical Center and one of the researchers in the U.S. study, believes that China is on a path that is more likely to succeed than the large language models such as ChatGPT, the current preoccupation in the West.


"ChatGPT by design cannot go back and forth in its reasoning, it is limited to linear thinking," Riesenhuber said in an interview.

In China, "There is a very sensible realization that LLMs are limited as models of real intelligence. A surefire way to get to real intelligence is to reverse engineer the human brain, which is the only intelligent system that we know of," Riesenhuber said.

He cautioned that it was unclear if anyone, including China, would succeed at this next level AI. "However, getting closer to the real thing will already have payoffs," he said.


In speeches and state documents seen by Newsweek, Chinese officials say they are aiming for "first-mover advantage" in AGI. Scientists say the potentially self-replicating nature of AI means that could give them a permanent edge.

"General artificial intelligence is the original and ultimate goal of artificial intelligence research," Zhu, the founder and director of China's leading AGI institute, the Beijing Institute of General Artificial Intelligence (BIGAI), said in an online event earlier this year. "I specifically decided to use AGI as part of the institution's name to distinguish it from dedicated artificial intelligence," such as "face recognition, object detection or text translation," Zhu said.

Zhu, like many other leading AI scientists in China, is a former student and professor in U.S. universities.


AGI with Chinese Characteristics

China's quest for AGI won top-level, public policy support in 2017, when the government published a "New Generation AI Development Plan" laying out a path to leading the field globally by 2030.

Public documents seen by Newsweek reveal wide and deep official support. One from May this year shows the Beijing Municipal Science and Technology Commission signed off on a two-year effort to position the city as a global leader in AGI research, with significant state subsidies.


Beijing will focus on "brain-like intelligence, embodied intelligence...and produce enlightened large models and general intelligence," it said. A one million-square-feet, "Beijing General Artificial Intelligence Innovation Park" is due to be finished at the end of 2024.

CHINESE SCIENTISTS AIM FOR AI 'NUCLEAR EXPLOSION'
Robots are lined up at the World Artificial Intelligence Conference (WAIC) in Shanghai in July. China's research has included an examination of how to mimic human brain neural networks in computers, and efforts that could ultimately lead to human-robot hybrids—for example by placing a large scale brain simulation on a robot body.AFP VIA GETTY IMAGES/WANG ZHAO
Beijing's northwest Haidian university district of Zhongguancun, China's "Silicon Valley," is a hub of the research. In addition to BIGAI, the Beijing Academy of Artificial Intelligence (BAAI) is in the capital as are Peking and Tsinghua universities, also highly active in AI research. The centers interconnect.


This concentration of R&D and commercial activity will create "a 'nuclear explosion point' for the development of general artificial intelligence," according to the district government. The district and city governments did not respond to requests for comment.

Other key research locations include Wuhan, Shanghai, Shenzhen, Hefei, Harbin and Tianjin.

Safety First

The prospect of anyone developing superintelligent AGIs raises enormous safety concerns, says Nick Bostrom, director of the Future of Humanity Institute at Oxford University, when asked about China's work on AGI.


"We don't yet have reliable methods for scalable AI alignment. So, if general machine superintelligence were developed today, we might not be able to keep it safe and controlled," Bostrom told Newsweek.

AI as it is today was already sufficiently concerning with its potential for engineering enhanced chemical or biological weapons, cybercrime, automated drone swarms or its use for totalitarian oppression, or for propaganda or discrimination, he noted.

"Ultimately, we all sit in the same boat with respect to the most serious risks from AGI. So, the ideal would be that rather than squabbling and fighting on the deck while the storm is approaching, we would work together to get the vessel as ready and seaworthy as much as possible in whatever time we have remaining," Bostrom said.


Others have questioned the likelihood of an agreement on ethical implementation being respected even if it is agreed globally.

"Ethical considerations are undoubtedly a significant aspect of AI regulation in China. However, it remains unclear whether regulators have the capacity or willingness to strictly enforce rules related to ethics," said Angela Zhang, director of the at the Philip K. H. Wong Centre for Chinese Law at the University of Hong Kong.

"Given the prevailing political concerns, I don't believe that ethical considerations will be the dominant factor in China's future decisions on AI regulation," Zhang said.


Despite that, she said it was necessary to try to get agreement between China and the U.S. as the world's leading AI powerhouses, and voiced hopes it might be possible.

"It would be a missed opportunity not to involve China in establishing universal rules and standards for AI," Zhang said.

A Western AI expert who interacts with Chinese AI scientists and policymakers said that while Chinese officials were interested in international measures on safety or an international body to regulate AI, they must be part of setting the rules, or the effort would certainly fail. Questions to the Institute for AI International Governance of Tsinghua University went unanswered.


In a statement to Newsweek, Liu Pengyu, a spokesman for the Chinese embassy in D.C., made clear China wanted an "extensive" role in future AI regulation. "As a principal, China believes that the development of AI benefits all countries, and all countries should be able to participate extensively in the global governance of AI," Liu said.

Worrying About a Revolution

At home, the greatest fear over AI highlighted by Chinese officials is that it could lead to domestic rebellion against Communist Party rule. Commentators have publicly voiced concerns over the perceived political risks of foreign-made, generative AI.


"ChatGPT's political stance and ideology will subconsciously influence its hundreds of millions of user groups, becoming the most massive propaganda machine for Western values and ideology," Lu Chuanying, a governance, internet security and AI expert at the Shanghai Institutes for International Studies, wrote in the Shanghai media outlet The Paper.

"This is of great concern to our current ideology and discourse system," Lu wrote.

The concern prompted a new AI law that took effect in China on August 15. The first of five main provisions states that generative AI, including text, images, audio and video, "must adhere to socialist core values, must not generate incitement to subvert state power, overthrow the socialist system, damage national security or interests" and "must not hurt the national image."


Hard at work in their laboratories, some Chinese researchers have also been focusing on what AI could mean for ideology. One paper aiming for AGI noted that "unhealthy information can easily harm the psychology of college students." The title of the paper: "An intelligent software-driven immersive environment for online political guidance based on brain-computer interface and autonomous systems."

Such an application of the technology, it said, was inevitable.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848

An open question these days is what the role of task-specific / finetuned models will be. I can only think of three scenarios where it makes sense to work on task-specific models.

The first scenario is if you have private (e.g., legal, medical, business) data not found on the web that you want the model to know. And you have to have a decent amount of it—if it’s only a little bit, it'll be much easier to put it into a long-context model or retrieve it. Even if the best models today can’t do long-context or retrieval well, they probably will be able to in a few years.

The second scenario is if you are doing a domain-specific task and care a lot about doing it with low latency or low cost. For example, codex is a great use case, since you only need the coding ability and you want people to be able to use it at scale, quickly and cheaply. This makes sense because if you’re only doing one domain, you don’t need as many model parameters as general models do (less memorization required).

These two scenarios make sense because emergent performance on private tasks is not achievable by scaling (scenario 1), and scenario 2 by definition doesn’t want larger models.

The final scenario to use task-specific models is if you want to get the best possible performance on a task and can easily ride the wave of the next GPT-N+1 model (e.g., you are using a finetuning API from another provider, which will presumably get better as the provider trains better models).

In most other scenarios, it seems dangerous to focus on domain-specific models. For example, let’s say you train a model on some specific task and it beats GPT-4 this year. That is satisfying in the short run. However, with every GPT-n+1 could potentially beat your task-specific model due to scale enabling better reasoning, better access to tail knowledge, etc. We have seen this time and time again in the past (e.g., PubMedGPT and GPT-3 finetuned on GSM8K surpassed by PaLM). To me this is an obvious instance of The Bitter Lesson but would love to hear if I’ve missed anything.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848











DALL·E 3​

DALL·E 3 understands significantly more nuance and detail than our previous systems, allowing you to easily translate your ideas into exceptionally accurate images.

Quick links​


Avocado Square


Leaf


Blackbackdrop Square


Lychee


About DALL·E 3​

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall.
Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848

AI that’s smarter than humans? Americans say a firm “no thank you.”​

Exclusive: 63 percent of Americans want regulation to actively prevent superintelligent AI, a new poll reveals.

sam_altman_GettyImages_1258549651.0.jpg

Sam Altman, CEO of OpenAI, the company that made ChatGPT. For Altman, the chatbot is just a stepping stone on the way to artificial general intelligence.
SeongJoon Cho/Bloomberg via Getty Images


Major AI companies are racing to build superintelligent AI — for the benefit of you and me, they say. But did they ever pause to ask whether we actually want that?

Americans, by and large, don’t want it.

That’s the upshot of a new poll shared exclusively with Vox. The poll, commissioned by the think tank AI Policy Institute and conducted by YouGov, surveyed 1,118 Americans from across the age, gender, race, and political spectrums in early September. It reveals that 63 percent of voters say regulation should aim to actively prevent AI superintelligence.

Companies like OpenAI have made it clear that superintelligent AI — a system that is smarter than humans — is exactly what they’re trying to build. They call it artificial general intelligence (AGI) and they take it for granted that AGI should exist. “Our mission,” OpenAI’s website says, “is to ensure that artificial general intelligence benefits all of humanity.”

But there’s a deeply weird and seldom remarked upon fact here: It’s not at all obvious that we should want to create AGI — which, as OpenAI CEO Sam Altman will be the first to tell you, comes with major risks, including the risk that all of humanity gets wiped out. And yet a handful of CEOs have decided, on behalf of everyone else, that AGI should exist.

Now, the only thing that gets discussed in public debate is how to control a hypothetical superhuman intelligence — not whether we actually want it. A premise has been ceded here that arguably never should have been.
“It’s so strange to me to say, ‘We have to be really careful with AGI,’ rather than saying, ‘We don’t need AGI, this is not on the table,’” Elke Schwarz, a political theorist who studies AI ethics at Queen Mary University of London, told me earlier this year. “But we’re already at a point when power is consolidated in a way that doesn’t even give us the option to collectively suggest that AGI should not be pursued.”

Building AGI is a deeply political move. Why aren’t we treating it that way?​

Technological solutionism — the ideology that says we can trust technologists to engineer the way out of humanity’s greatest problems — has played a major role in consolidating power in the hands of the tech sector. Although this may sound like a modern ideology, it actually goes all the way back to the medieval period, when religious thinkers began to teach that technology is a means of bringing about humanity’s salvation. Since then, Western society has largely bought the notion that tech progress is synonymous with moral progress.

In modern America, where the profit motives of capitalism have combined with geopolitical narratives about needing to “race” against foreign military powers, tech accelerationism has reached fever pitch. And Silicon Valley has been only too happy to run with it.

RELATED

Silicon Valley’s vision for AI? It’s religion, repackaged.

AGI enthusiasts promise that the coming superintelligence will bring radical improvements. It could develop everything from cures for diseases to better clean energy technologies. It could turbocharge productivity, leading to windfall profits that may alleviate global poverty. And getting to it first could help the US maintain an edge over China; in a logic reminiscent of a nuclear weapons race, it’s better for “us” to have it than “them,” the argument goes.

But Americans have learned a thing or two from the past decade in tech, and especially from the disastrous consequences of social media. They increasingly distrust tech executives and the idea that tech progress is positive by default. And they’re questioning whether the potential benefits of AGI justify the potential costs of developing it. After all, CEOs like Altman readily proclaim that AGI may well usher in mass unemployment, break the economic system, and change the entire world order. That’s if it doesn’t render us all extinct.

In the new AI Policy Institute/YouGov poll, the “better us than China” argument was presented five different ways in five different questions. Strikingly, each time, the majority of respondents rejected the argument. For example, 67 percent of voters said we should restrict how powerful AI models can become, even though that risks making American companies fall behind China. Only 14 percent disagreed.

Naturally, with any poll about a technology that doesn’t yet exist, there’s a bit of a challenge in interpreting the responses. But what a strong majority of the American public seems to be saying here is: just because we’re worried about a foreign power getting ahead, doesn’t mean that it makes sense to unleash upon ourselves a technology we think will severely harm us.

AGI, it turns out, is just not a popular idea in America.
“As we’re asking these poll questions and getting such lopsided results, it’s honestly a little bit surprising to me to see how lopsided it is,” Daniel Colson, the executive director of the AI Policy Institute, told me. “There’s actually quite a large disconnect between a lot of the elite discourse or discourse in the labs and what the American public wants.”

And yet, Colson pointed out, “most of the direction of society is set by the technologists and by the technologies that are being released … There’s an important way in which that’s extremely undemocratic.”

He expressed consternation that when tech billionaires recently descended on Washington to opine on AI policy at Sen. Chuck Schumer’s invitation, they did so behind closed doors. The public didn’t get to watch, never mind participate in, a discussion that will shape its future.

According to Schwarz, we shouldn’t let technologists depict the development of AGI as if it’s some natural law, as inevitable as gravity. It’s a choice — a deeply political one.
“The desire for societal change is not merely a technological aim, it is a fully political aim,” she said. “If the publicly stated aim is to ‘change everything about society,’ then this alone should be a prompt to trigger some level of democratic input and oversight.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848
RELATED

The case for slowing down AI

AI companies are radically changing our world. Should they be getting our permission first?​

AI stands to be so transformative that even its developers are expressing unease about how undemocratic its development has been.

Jack Clark, the co-founder of AI safety and research company Anthropic, recently wrote an unusually vulnerable newsletter. He confessed that there were several key things he’s “confused and uneasy” about when it comes to AI. Here is one of the questions he articulated: “How much permission do AI developers need to get from society before irrevocably changing society?” Clark continued:
Technologists have always had something of a libertarian streak and this is perhaps best epitomized by the ‘social media’ and Uber et al era of the 2010s — vast, society-altering systems ranging from social networks to rideshare systems were deployed into the world and aggressively scaled with little regard to the societies they were influencing. This form of permissionless invention is basically the implicitly preferred form of development as epitomized by Silicon Valley and the general ‘move fast and break things’ philosophy of tech. Should the same be true of AI?

That more people, including tech CEOs, are starting to question the norm of “permissionless invention” is a very healthy development. It also raises some tricky questions.

When does it make sense for technologists to seek buy-in from those who’ll be affected by a given product? And when the product will affect the entirety of human civilization, how can you even go about seeking consensus?

Many of the great technological innovations in history happened because a few individuals decided by fiat that they had a great way to change things for everyone. Just think of the invention of the printing press or the telegraph. The inventors didn’t ask society for its permission to release them.

That may be partly because of technological solutionism and partly because, well, it would have been pretty hard to consult broad swaths of society in an era before mass communications — before things like a printing press or a telegraph! And while those inventions did come with perceived risks, they didn’t pose the threat of wiping out humanity altogether or making us subservient to a different species.

For the few technologies we’ve invented so far that meet that bar, seeking democratic input and establishing mechanisms for global oversight have been attempted, and rightly so. It’s the reason we have a Nuclear Nonproliferation Treaty and a Biological Weapons Convention — treaties that, though they’re struggling, matter a lot for keeping our world safe.

While those treaties came after the use of such weapons, another example — the 1967 Outer Space Treaty — shows that it’s possible to create such mechanisms in advance. Ratified by dozens of countries and adopted by the United Nations against the backdrop of the Cold War, it laid out a framework for international space law. Among other things, it stipulated that the moon and other celestial bodies can only be used for peaceful purposes, and that states can’t store their nuclear weapons in space.

Nowadays, the treaty comes up in debates about whether we should send messages into space with the hope of reaching extraterrestrials. Some argue that’s very dangerous because an alien species, once aware of us, might oppress us. Others argue it’s more likely to be a boon — maybe the aliens will gift us their knowledge in the form of an Encyclopedia Galactica. Either way, it’s clear that the stakes are incredibly high and all of human civilization would be affected, prompting some to make the case for democratic deliberation before any more intentional transmissions are sent into space.

As Kathryn Denning, an anthropologist who studies the ethics of space exploration, put it in an interview with the New York Times, “Why should my opinion matter more than that of a 6-year-old girl in Namibia? We both have exactly the same amount at stake.”

Or, as the old Roman proverb goes: what touches all should be decided by all.

That is as true of superintelligent AI as it is of nukes, chemical weapons, or interstellar broadcasts. And though some might argue that the American public only knows as much about AI as a 6-year-old, that doesn’t mean it’s legitimate to ignore or override the public’s general wishes for technology.
“Policymakers shouldn’t take the specifics of how to solve these problems from voters or the contents of polls,” Colson acknowledged. “The place where I think voters are the right people to ask, though, is: What do you want out of policy? And what direction do you want society to go in?”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848

Towards Expert-Level Medical Question Answering with Large Language Models​

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge.

Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach.

Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets.

We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations.

While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848
The AI bot reads the vacancy, compares it with the resume and fills out questions and cover letters
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848

New Generative AI Technique Brings Researchers One Step Closer to Building a “Large Behavior Model”

LOS ALTOS, Calif. and CAMBRIDGE, Mass. (Sept. 19, 2023) – Today, Toyota Research Institute (TRI) announced a breakthrough generative AI approach based on Diffusion Policy to quickly and confidently teach robots new, dexterous skills. This advancement significantly improves robot utility and is a step towards building “Large Behavior Models (LBMs)” for robots, analogous to the Large Language Models (LLMs) that have recently revolutionized conversational AI.

“Our research in robotics is aimed at amplifying people rather than replacing them,” said Gill Pratt, CEO of TRI and Chief Scientist for Toyota Motor Corporation. “This new teaching technique is both very efficient and produces very high performing behaviors, enabling robots to much more effectively amplify people in many ways.”

Previous state-of-the-art techniques to teach robots new behaviors were slow, inconsistent, inefficient, and often limited to narrowly defined tasks performed in highly constrained environments. Roboticists needed to spend many hours writing sophisticated code and/or using numerous trial and error cycles to program behaviors.

TRI has already taught robots more than 60 difficult, dexterous skills using the new approach, including pouring liquids, using tools, and manipulating deformable objects. These achievements were realized without writing a single line of new code; the only change was supplying the robot with new data. Building on this success, TRI has set an ambitious target of teaching hundreds of new skills by the end of the year and 1,000 by the end of 2024.

Today’s news also highlights that robots can be taught to function in new scenarios and perform a wide range of behaviors. These skills are not limited to just “‘pick and place” or simply picking up objects and putting them down in new locations. TRI’s robots can now interact with the world in varied and rich ways — which will one day allow robots to support people in everyday situations and unpredictable, ever-changing environments.



“The tasks that I’m watching these robots perform are simply amazing – even one year ago, I would not have predicted that we were close to this level of diverse dexterity,” remarked Russ Tedrake, Vice President of Robotics Research at TRI. Dr. Tedrake, who is also the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering at MIT, explained, “What is so exciting about this new approach is the rate and reliability with which we can add new skills. Because these skills work directly from camera images and tactile sensing, using only learned representations, they are able to perform well even on tasks that involve deformable objects, cloth, and liquids — all of which have traditionally been extremely difficult for robots.”

Technical details:

TRI’s robot behavior model learns from haptic demonstrations from a teacher, combined with a language description of the goal. It then uses an AI-based Diffusion Policy to learn the demonstrated skill. This process allows a new behavior to be deployed autonomously from dozens of demonstrations. Not only does this approach produce consistent, repeatable, and performant results, but it does so with tremendous speed.

Key achievements of TRI’s research for this novel development include:

  • Diffusion Policy: TRI and our collaborators in Professor Song’s group at Columbia University developed a new, powerful generative-AI approach to behavior learning. This approach, called Diffusion Policy, enables easy and rapid behavior teaching from demonstration.
  • Customized Robot Platform: TRI’s robot platform is custom-built for dexterous dual-arm manipulation tasks with a special focus on enabling haptic feedback and tactile sensing.
  • Pipeline: TRI robots have learned 60 dexterous skills already, with a target of hundreds by the end of the year and 1,000 by the end of 2024.
  • Drake: Part of our (not so) secret sauce is Drake, a model-based design for robotics that provides us with a cutting-edge toolbox and simulation platform. Drake’s high degree of realism allows us to develop in both simulation and in reality at a dramatically increased scale and velocity than would otherwise be possible. Our internal robot stack is built using Drake’s optimization and systems frameworks, and we have made Drake open source to catalyze work across the entire robotics community.
  • Safety: Safety is core to our robotics efforts at TRI. We have designed our system with strong safeguards, powered by Drake and our custom robot control stack, to ensure our robots respect safety guarantees like not colliding with itself or its environment.
Diffusion Policy has been published at the 2023 Robotics Science and Systems conference. Additional technical information can be found on TRI’s Medium blog.

Please join our LinkedIn Live Q&A session on October 4th from 1 pm – 1:30 pm ET / 10 am – 10:30 am PT, for an opportunity to learn more and hear directly from the TRI robotics research team. Sign up for the event on
TRI’s LinkedIn page.


About Toyota Research Institute

Toyota Research Institute (TRI) conducts research to amplify human ability, focusing on making our lives safer and more sustainable. Led by Dr. Gill Pratt, TRI’s team of researchers develops technologies to advance energy and materials, human-centered artificial intelligence, human interactive driving, machine learning, and robotics. Established in 2015, TRI has offices in Los Altos, California, and Cambridge, Massachusetts. For more information about TRI, please visit http://tri.global.




Diffusion Policy​

Visuomotor Policy Learning via Action Diffusion​

teaser.svg

This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models.


Highlights​

multimodal_sim.svg
Diffusion Policy learns multi-modal behavior and commits to only one mode within each rollout. LSTM-GMM and IBC are biased toward one mode, while BET failed to commit.
Diffusion Policy predicts a sequence of action for receding-horizon control.
The Mug Flipping task requires the policy to predict smooth 6 DoF actions while operating close to kinetmatic limits.
Toward making 🍕: The sauce pouring and spreading task manipulates liquid with 6 DoF and periodic actions.
In our Push-T experiments, Diffusion Policy is highly robust against purturbations and visual distractions.

Simulation Benchmarks​

Diffusion Policy outperforms prior state-of-the-art on 12 tasks across 4 benchmarks with an average success-rate improvement of 46.9%. Check out our paper for further details!

Lift 1
Can 1
Square 1
Tool Hang 1
Transport 1
Push-T 2
Block Pushing 2,3
Franka Kitchen 3,4
Standarized simulation benchmarks are essential for this project's development.
Special shoutout to the authors of these projects for open-sourcing their simulation environments:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,848

Telling AI model to “take a deep breath” causes math scores to soar in study​

DeepMind used AI models to optimize their own prompts, with surprising results.​

BENJ EDWARDS - Tuesday at undefined

A worried-looking tin toy robot.
Enlarge
Getty Images
90WITH

Google DeepMind researchers recently developed a technique to improve math ability in AI language models like ChatGPT by using other AI models to improve prompting—the written instructions that tell the AI model what to do. It found that using human-style encouragement improved math skills dramatically, in line with earlier results.

In a paper called "Large Language Models as Optimizers" listed this month on arXiv, DeepMind scientists introduced Optimization by PROmpting (OPRO), a method to improve the performance of large language models (LLMs) such as OpenAI’s ChatGPT and Google’s PaLM 2. This new approach sidesteps the limitations of traditional math-based optimizers by using natural language to guide LLMs in problem-solving. "Natural language" is a fancy way of saying everyday human speech.

FURTHER READING​

A jargon-free explanation of how AI large language models work

"Instead of formally defining the optimization problem and deriving the update step with a programmed solver," the researchers write, "we describe the optimization problem in natural language, then instruct the LLM to iteratively generate new solutions based on the problem description and the previously found solutions."

Typically, in machine learning, techniques using algorithms such as derivative-based optimizers act as a guide for improving an AI model's performance. Imagine a model's performance as a curve on a graph: The goal is to find the lowest point on this curve because that's where the model makes the fewest mistakes. By using the slope of the curve to make adjustments, the optimizer helps the model get closer and closer to that ideal low point, making it more accurate and efficient at whatever task it's designed to do.

Rather than relying on formal mathematical definitions to perform this task, OPRO uses "meta-prompts" described in natural language to set the stage for the optimization process. The LLM then generates candidate solutions based on the problem’s description and previous solutions, and it tests them by assigning each a quality score.

In OPRO, two large language models play different roles: a scorer LLM evaluates the objective function such as accuracy, while an optimizer LLM generates new solutions based on past results and a natural language description. Different pairings of scorer and optimizer LLMs are evaluated, including models like PaLM 2 and GPT variants. OPRO can optimize prompts for the scorer LLM by having the optimizer iteratively generate higher-scoring prompts. These scores help the system identify the best solutions, which are then added back into the 'meta-prompt' for the next round of optimization.

“Take a deep breath and work on this step by step”​

Perhaps the most intriguing part of the DeepMind study is the impact of specific phrases on the output. Phrases like "let's think step by step" prompted each AI model to produce more accurate results when tested against math problem data sets. (This technique became widely known in May 2022 thanks to a now-famous paper titled "Large Language Models are Zero-Shot Reasoners.")

FURTHER READING​

The AI race heats up: Google announces PaLM 2, its answer to GPT-4

Consider a simple word problem, such as, "Beth bakes four two-dozen batches of cookies in a week. If these cookies are shared among 16 people equally, how many cookies does each person consume?" The 2022 paper discovered that instead of just feeding a chatbot a word problem like this by itself, you'd instead prefix it with "Let's think step by step" and then paste in the problem. The accuracy of the AI model's results almost always improves, and it works well with ChatGPT.

Interestingly, in this latest study, DeepMind researchers found "Take a deep breath and work on this problem step by step" to be the most effective prompt when used with Google's PaLM 2 language model. The phrase achieved the top accuracy score of 80.2 percent in tests against GSM8K, which is a data set of grade-school math word problems. By comparison, PaLM 2, without any special prompting, scored only 34 percent accuracy on GSM8K, and the classic "Let’s think step by step" prompt scored 71.8 percent accuracy.

So why does this work? Obviously, large language models can't take a deep breath because they don't have lungs or bodies. They don't think and reason like humans, either. What "reasoning" they do (and "reasoning" is a contentious term among some, though it is readily used as a term of art in AI) is borrowed from a massive data set of language phrases scraped from books and the web. That includes things like Q&A forums, which include many examples of "let's take a deep breath" or "think step by step" before showing more carefully reasoned solutions. Those phrases may help the LLM tap into better answers or produce better examples of reasoning or problem-solving from the data set it absorbed into its neural network during training.

Even though working out the best ways to give LLMs human-like encouragement is slightly puzzling to us, that's not a problem for OPRO because the technique utilizes large language models to discover these more effective prompting phrases. DeepMind researchers think that the biggest win for OPRO is its ability to sift through many possible prompts to find the one that gives the best results for a specific problem. This could allow people to produce far more useful or accurate results from LLMs in the future.
 
Top