bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

Google wrote a ‘Robot Constitution’ to make sure its new AI droids won’t kill us​


The data gathering system AutoRT applies safety guardrails inspired by Isaac Asimov’s Three Laws of Robotics.

By Amrita Khalid, one of the authors of audio industry newsletter Hot Pod. Khalid has covered tech, surveillance policy, consumer gadgets, and online communities for more than a decade.

Jan 4, 2024, 4:21 PM EST|11 Comments / 11 New



Screen_Shot_2024_01_04_at_10.55.35_AM.png

Image: Google

The DeepMind robotics team has revealed three new advances that it says will help robots make faster, better, and safer decisions in the wild. One includes a system for gathering training data with a “Robot Constitution” to make sure your robot office assistant can fetch you more printer paper — but without mowing down a human co-worker who happens to be in the way.

Google’s data gathering system, AutoRT, can use a visual language model (VLM) and large language model (LLM) working hand in hand to understand its environment, adapt to unfamiliar settings, and decide on appropriate tasks. The Robot Constitution, which is inspired by Isaac Asimov’s “Three Laws of Robotics,” is described as a set of “safety-focused prompts” instructing the LLM to avoid choosing tasks that involve humans, animals, sharp objects, and even electrical appliances.

For additional safety, DeepMind programmed the robots to stop automatically if the force on its joints goes past a certain threshold and included a physical kill switch human operators can use to deactivate them. Over a period of seven months, Google deployed a fleet of 53 AutoRT robots into four different office buildings and conducted over 77,000 trials. Some robots were controlled remotely by human operators, while others operated either based on a script or completely autonomously using Google’s Robotic Transformer (RT-2) AI learning model.


Screen_Shot_2024_01_04_at_11.52.15_AM.png

AutoRT follows these four steps for each task.

The robots used in the trial look more utilitarian than flashy — equipped with only a camera, robot arm, and mobile base. “For each robot, the system uses a VLM to understand its environment and the objects within sight. Next, an LLM suggests a list of creative tasks that the robot could carry out, such as ‘Place the snack onto the countertop’ and plays the role of decision-maker to select an appropriate task for the robot to carry out,” noted Google in its blog post.


ezgif.com_optimize.gif

Image: Google

DeepMind’s other new tech includes SARA-RT, a neural network architecture designed to make the existing Robotic Transformer RT-2 more accurate and faster. It also announced RT-Trajectory, which adds 2D outlines to help robots better perform specific physical tasks, such as wiping down a table.

We still seem to be a very long way from robots that serve drinks and fluff pillows autonomously, but when they’re available, they may have learned from a system like AutoRT.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

Shaping the future of advanced robotics​

Published
4 JANUARY 2024
Authors
The Google DeepMind Robotics Team

Auto-RT Robot

Introducing AutoRT, SARA-RT and RT-Trajectory to improve real-world robot data collection, speed, and generalization

Picture a future in which a simple request to your personal helper robot - “tidy the house” or “cook us a delicious, healthy meal” - is all it takes to get those jobs done. These tasks, straightforward for humans, require a high-level understanding of the world for robots.

Today we’re announcing a suite of advances in robotics research that bring us a step closer to this future. AutoRT, SARA-RT, and RT-Trajectory build on our historic Robotics Transformers work to help robots make decisions faster, and better understand and navigate their environments.



AutoRT: Harnessing large models to better train robots​

We introduce AutoRT, a system that harnesses the potential of large foundation models which is critical to creating robots that can understand practical human goals. By collecting more experiential training data – and more diverse data – AutoRT can help scale robotic learning to better train robots for the real world.

AutoRT combines large foundation models such as a Large Language Model (LLM) or Visual Language Model (VLM), and a robot control model (RT-1 or RT-2) to create a system that can deploy robots to gather training data in novel environments. AutoRT can simultaneously direct multiple robots, each equipped with a video camera and an end effector, to carry out diverse tasks in a range of settings. For each robot, the system uses a VLM to understand its environment and the objects within sight. Next, an LLM suggests a list of creative tasks that the robot could carry out, such as “Place the snack onto the countertop” and plays the role of decision-maker to select an appropriate task for the robot to carry out.

In extensive real-world evaluations over seven months, the system safely orchestrated as many as 20 robots simultaneously, and up to 52 unique robots in total, in a variety of office buildings, gathering a diverse dataset comprising 77,000 robotic trials across 6,650 unique tasks.

o1iXzRTpu6dEFrnUk2kiiuNJWvNlzjgQLU_HW3FVslb57Jt6pMR5qEn4WVjfJeb7wJhewmSOWAnVM8rOTmkUSiUNClOhCCCkccGTnlJF-h-ZT1hVew=w616

(1) An autonomous wheeled robot finds a location with multiple objects. (2) A VLM describes the scene and objects to an LLM. (3) An LLM suggests diverse manipulation tasks for the robot and decides which tasks the robot could do unassisted, which would require remote control by a human, and which are impossible, before making a choice. (4) The chosen task is attempted, the experiential data collected, and the data scored for its diversity/novelty. Repeat.



Layered safety protocols are critical​

Before robots can be integrated into our everyday lives, they need to be developed responsibly with robust research demonstrating their real-world safety.

While AutoRT is a data-gathering system, it is also an early demonstration of autonomous robots for real-world use. It features safety guardrails, one of which is providing its LLM-based decision-maker with a Robot Constitution - a set of safety-focused prompts to abide by when selecting tasks for the robots. These rules are in part inspired by Isaac Asimov’s Three Laws of Robotics – first and foremost that a robot “may not injure a human being”. Further safety rules require that no robot attempts tasks involving humans, animals, sharp objects or electrical appliances.

But even if large models are prompted correctly with self-critiquing, this alone cannot guarantee safety. So the AutoRT system comprises layers of practical safety measures from classical robotics. For example, the collaborative robots are programmed to stop automatically if the force on its joints exceed a given threshold, and all active robots were kept in line-of-sight of a human supervisor with a physical deactivation switch.



SARA-RT: Making Robotics Transformers leaner and faster​

Our new system, Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT), converts Robotics Transformer (RT) models into more efficient versions.

The RT neural network architecture developed by our team is used in the latest robotic control systems, including our state-of-the-art RT-2 model. The best SARA-RT-2 models were 10.6% more accurate and 14% faster than RT-2 models after being provided with a short history of images. We believe this is the first scalable attention mechanism to provide computational improvements with no quality loss.

While transformers are powerful, they can be limited by computational demands that slow their decision-making. Transformers critically rely on attention modules of quadratic complexity. That means if an RT model’s input doubles – by giving a robot additional or higher-resolution sensors, for example – the computational resources required to process that input rise by a factor of four, which can slow decision-making.

SARA-RT makes models more efficient using a novel method of model fine-tuning that we call “up-training”. Up-training converts the quadratic complexity to mere linear complexity, sharply reducing the computational requirements. This conversion not only increases the original model’s speed, but also preserves its quality.

We designed our system for usability and hope many researchers and practitioners will apply it, in robotics and beyond. Because SARA provides a universal recipe for speeding up Transformers, without need for computationally expensive pre-training, this approach has the potential to massively scale up use of Transformers technology. SARA-RT does not require any additional code as various open-sourced linear variants can be used.

When we applied SARA-RT to a state-of-the-art RT-2 model with billions of parameters, it resulted in faster decision-making and better performance on a wide range of robotic tasks.

Play

SARA-RT-2 model for manipulation tasks. Robot’s actions are conditioned on images and text commands.

And with its robust theoretical grounding, SARA-RT can be applied to a wide variety of Transformer models. For example, applying SARA-RT to Point Cloud Transformers - used to process spatial data from robot depth cameras - more than doubled their speed.



RT-Trajectory: Helping robots generalize​

It may be intuitive for humans to understand how to wipe a table, but there are many possible ways a robot could translate an instruction into actual physical motions.

We developed a model called RT-Trajectory, which automatically adds visual outlines that describe robot motions in training videos. RT-Trajectory takes each video in a training dataset and overlays it with a 2D trajectory sketch of the robot arm’s gripper as it performs the task. These trajectories, in the form of RGB images, provide low-level, practical visual hints to the model as it learns its robot-control policies.

When tested on 41 tasks unseen in the training data, an arm controlled by RT-Trajectory more than doubled the performance of existing state-of-the-art RT models: it achieved a task success rate of 63%, compared with 29% for RT-2.

Traditionally, training a robotic arm relies on mapping abstract natural language (“wipe the table”) to specific movements (close gripper, move left, move right), making it hard for models to generalize to novel tasks. In contrast, an RT-Trajectory model enables RT models to understand "how to do" tasks by interpreting specific robot motions like those contained in videos or sketches.

The system is versatile: RT-Trajectory can also create trajectories by watching human demonstrations of desired tasks, and even accept hand-drawn sketches. And it can be readily adapted to different robot platforms.

Play

Left: A robot, controlled by an RT model trained with a natural-language-only dataset, is stymied when given the novel task: “clean the table”. A robot controlled by RT-Trajectory, trained on the same dataset augmented by 2D trajectories, successfully plans and executes a wiping trajectory

Right: A trained RT-Trajectory model given a novel task (“clean the table”) can create 2D trajectories in a variety of ways, assisted by humans or on its own using a vision-language model.

RT-Trajectory makes use of the rich robotic-motion information that is present in all robot datasets, but currently under-utilized. RT-Trajectory not only represents another step along the road to building robots able to move with efficient accuracy in novel situations, but also unlocking knowledge from existing datasets.


Building the foundations for next-generation robots​

By building on the foundation of our state-of-the-art RT-1 and RT-2 models, each of these pieces help create ever more capable and helpful robots. We envision a future in which these models and systems can be integrated to create robots – with the motion generalization of RT-Trajectory, the efficiency of SARA-RT, and the large-scale data collection from models like AutoRT. We will continue to tackle challenges in robotics today and to adapt to the new capabilities and technologies of more advanced robotics.

Learn more




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

PennsylvaniaGPT Is Here to Hallucinate Over Cheesesteaks​

State employees in Pennsylvania will begin using ChatGPT to assist with their work in the coming weeks.​


By Maxwell Zeff

Published 10 minutes ago

Pennsylvania Governor Josh Shapiro

Pennsylvania Governor Josh Shapiro

Photo: Drew Angerer (Getty Images)

Pennsylvania became the first state in the nation to use ChatGPT Enterprise, Governor Josh Shapiro announced Tuesday, leading a pilot program with OpenAI in which state employees will use generative AI. Governor Shapiro says Pennsylvania government employees will start using ChatGPT Enterprise this month to help state officials with their work, but not replace workers altogether.

“Generative AI is here and impacting our daily lives already – and my Administration is taking a proactive approach to harness the power of its benefits while mitigating its potential risks,” said Governor Shapiro in a press release.

ChatGPT will initially be used by a small number of Pennsylvania government employees to create and edit copy, update policy language, draft job descriptions, and help employees generate code. After the initial trial period, Governor Shapiro’s office says ChatGPT will be used more broadly by other parts of Pennsylvania’s government. However, no citizens will ever interact with ChatGPT directly as part of this pilot program.

The enterprise version of ChatGPT has additional security and privacy features from the consumer product, which has drawn criticism for security bugs in the last year. Companies like PwC, Block, and Canva have been using OpenAI’s service for much of the last year, and they’ve developed tailored versions of ChatGPT to help with their daily operations.

“Our collaboration with Governor Shapiro and the Pennsylvania team will provide valuable insights into how AI tools can responsibly enhance state services,” said Opean AI CEO Sam Altman in the same press release.

Pennsylvania is the first state to deploy ChatGPT with government-sensitive materials, creating the ultimate stress test of OpenAI’s security measures. The pilot is seen as a test run for other state governments. One major consideration for Pennsylvania is ChatGPT’s tendency to hallucinate or make up bits of information when handling sensitive government policies.

For the purpose of this article, we asked ChatGPT about its stance on an important question to many Pennsylvanians: Which Philly cheesesteak is superior? ChatGPT was split between the two famed rivals, Pat’s and Geno’s, at first, but ultimately decided that Pats is the best contender based on originality.


Image for article titled PennsylvaniaGPT Is Here to Hallucinate Over Cheesesteaks

Screenshot: ChatGPT

Governor Shapiro’s office did not immediately respond to Gizmodo’s request for comment.

Pennsylvania is in many ways, an obvious choice for partnering with OpenAI. Governor Shapiro signed an executive order in September to allow state agencies to use generative AI in their work, and the state is home to Carnegie Mellon, whose researchers have paved the way for AI research.

The pilot program with the Pennsylvania Commonwealth is a strong vote of confidence for OpenAI as it heads to court with The New York Times over copyright infringement. If OpenAI loses, its GPT models built with NYTimes training data, which is most of them, would have to be scrapped. It seems Governor Shapiro does not consider that a major threat to working with OpenAI.

Amid this pilot launch, OpenAI is also on the cusp of releasing its GPT Store this week. The store has the potential to make generative AI a more significant part of many more people’s lives. Pennsylvania appears to be getting in front of that wave.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

AI-powered search engine Perplexity AI, now valued at $520M, raises $73.6M​

Kyle Wiggers @kyle_l_wiggers / 6:30 AM EST•January 4, 2024


Hand holding a magnifying glass against the sky to represent search engine default choices.

Image Credits: Panuwat Dangsungnoen / EyeEm (opens in a new window)/ Getty Images

As search engine incumbents — namely Google — amp up their platforms with GenAI tech, startups are looking to reinvent AI-powered search from the ground up. It might seem like a Sisyphean task, going up against competitors with billions upon billions of users. But this new breed of search upstarts believes it can carve out a niche, however small, by delivering a superior experience.

One among the cohort, Perplexity AI, this morning announced that it raised $73.6 million in a funding round led by IVP with additional investments from NEA, Databricks Ventures, former Twitter VP Elad Gil, Shopify CEO Tobi Lutke, ex-GitHub CEO Nat Friedman and Vercel founder Guillermo Rauch. Other participants in the round included Nvidia and — notably — Jeff Bezos.

Sources familiar with the matter tell TechCrunch that the round values Perplexity at $520 million post-money. That’s chump change in the realm of GenAI startups. But, considering that Perplexity’s only been around since August 2022, it’s a nonetheless impressive climb.

Perplexity was founded by Aravind Srinivas, Denis Yarats, Johnny Ho and Andy Konwinski — engineers with backgrounds in AI, distributed systems, search engines and databases. Srinivas, Perplexity’s CEO, previously worked at OpenAI, where he researched language and GenAI models along the lines of Stable Diffusion and DALL-E 3.

Unlike traditional search engines, Perplexity offers a chatbot-like interface that allows users to ask questions in natural language (e.g. “Do we burn calories while sleeping?,” “What’s the least visited country?,” and so on). The platform’s AI responds with a summary containing source citations (mostly websites and articles), at which point users can ask follow-up questions to dive deeper into a particular subject.

Perplexity AI

Performing a search with Perplexity. Image Credits: Perplexity AI

“With Perplexity, users can get instant … answers to any question with full sources and citations included,” Srinivas said. “Perplexity is for anyone and everyone who uses technology to search for information.”

Underpinning the Perplexity platform is an array of GenAI models developed in-house and by third parties. Subscribers to Perplexity’s Pro plan ($20 per month) can switch models — Google’s Gemini, Mistra 7Bl, Anthropic’s Claude 2.1 and OpenAI’s GPT-4 are in the rotation presently — and unlock features like image generation; unlimited use of Perplexity’s Copilot, which considers personal preferences during searches; and file uploads, which allows users to upload documents including images and have models analyze the docs to formulate answers about them (e.g. “Summarize pages 2 and 4”).

If the experience sounds comparable to Google’s Bard, Microsoft’s Copilot and ChatGPT, you’re not wrong. Even Perplexity’s chat-forward UI is reminiscent of today’s most popular GenAI tools.

Beyond the obvious competitors, the search engine startup You.com offers similar AI-powered summarizing and source-citing tools, powered optionally by GPT-4.

Srinivas makes the case that Perplexity offers more robust search filtering and discovery options than most, for example letting users limit searches to academic papers or browse trending search topics submitted by other users on the platform. I’m not convinced that they’re so differentiated that they couldn’t be replicated — or haven’t already been replicated for that matter. But Perplexity has ambitions beyond search. It’s beginning to serve its own GenAI models, which leverage Perplexity’s search index and the public web for ostensibly improved performance, through an API available to Pro customers.

This reporter is skeptical about the longevity of GenAI search tools for a number of reasons, not least of which AI models are costly to run. At one point, OpenAI was spending approximately $700,000 per day to keep up with the demand for ChatGPT. Microsoft is reportedly losing an average of $20 per user per month on its AI code generator, meanwhile.

Sources familiar with the matter tell TechCrunch Perplexity’s annual recurring revenue is between $5 million and $10 million at the moment. That seems fairly healthy… until you factor in the millions of dollars it often costs to train GenAI models like Perplexity’s own.

Concerns around misuse and misinformation inevitably crop up around GenAI search tools like Perplexity, as well — as they well should. AI isn’t the best summarizer after all, sometimes missing key details, misconstruing and exaggerating language or otherwise inventing facts very authoritatively. And it’s prone to spewing bias and toxicity — as Perplexity’s own models recently demonstrated.

Yet another potential speed bump on Perplexity’s road to success is copyright. GenAI models “learn” from examples to craft essays, code, emails, articles and more, and many vendors — including Perplexity, presumably — scrape the web for millions to billions of these examples to add to their training datasets. Vendors argue fair use doctrine provides a blanket protection for their web-scraping practices, but artists, authors and other copyright holders disagree — and have filed lawsuits seeking compensation.

As a tangentially related aside, while an increasing number of GenAI vendors offer policies protecting customers from IP claims against them, Perplexity does not. According to the company’s terms of service, customers agree to “hold harmless” Perplexity from claims, damages and liabilities arising from the use of its services — meaning Perplexity’s off the hook where it concerns legal fees.

Some plaintiffs, like The New York Times, have argued GenAI search experiences siphon off publishers’ content, readers and ad revenue through anticompetitive means. “Anticompetitive” or no, the tech is certainly impacting traffic. A model from The Atlantic found that if a search engine like Google were to integrate AI into search, it’d answer a user’s query 75% of the time without requiring a click-through to its website. (Some vendors, such as OpenAI, have inked deals with certain news publishers, but most — including Perplexity — haven’t.)

Srinivas pitches this as a feature — not a bug.

“[With Perplexity, there’s] no need to click on different links, compare answers or endlessly dig for information,” he said. “The era of sifting through SEO spam, sponsored links and multiple sources will be replaced by a more efficient model of knowledge acquisition and sharing, propelling society into a new era of accelerated learning and research.”

The many uncertainties around Perplexity’s business model — and GenAI and consumer search at large — don’t appear to be deterring its investors. To date, the startup, which claims to have 10 million active monthly users, has raised over $100 million — much is which is being put toward expanding its 39-person team and building new product functionality, Srinivas says.

“Perplexity is intensely building a product capable of bringing the power of AI to billions,” Cack Wilhelm, a general partner at IVP, added via email. “Aravind possesses the unique ability to uphold a grand, long-term vision while shipping product relentlessly, requirements to tackle a problem as important and fundamental as search.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

Jeff Bezos Is Betting This AI Startup Will Dethrone Google in Search​

Perplexity AI wants to revolutionize search with answers instead of links, and Amazon’s founder has agreed to help it unseat Google.​


By Maxwell Zeff

Published Yesterday

Comments (4)

Image for article titled Jeff Bezos Is Betting This AI Startup Will Dethrone Google in Search

Photo: Dan Istitene (Getty Images)

Google has been the dominant search engine since the early 2000s, but Jeff Bezos is betting that AI will change the way people find information on the internet. Bezos invested millions in Perplexity AI last week, a startup that hopes to revolutionize search with AI-generated answers and make Google a thing of the past.

“Google is going to be viewed as something that’s legacy and old,” said Perplexity’s founder Aravind Srinivas to Reuters last week. “If you can directly answer somebody’s question, nobody needs those 10 blue links.”

Perplexity AI is like ChatGPT had a baby with Google Search, and Jeff Bezos is paying for its college fund. Srinivas’ company approaches search differently from Google. Instead of a list of blue links, Perplexity answers your question in a straightforward paragraph answer, generated by AI, and includes hyperlinks to the websites it got the information from. Perplexity runs on leading large language models from OpenAI and Anthropic, but the company claims it’s better at delivering up-to-date, accurate information than ChatGPT or Claude.

Srinivas’ company has received more financial backing than any search startup in recent years, according to the Wall Street Journal. Search is a tough field that Google has had a chokehold on for the last two decades. However, several notable tech innovators invested in Perplexity, including Jeff Bezos and Nvidia in a $76 million funding round last week. Meta’s Chief Scientist Yann LeCun and OpenAI’s Andrej Karpathy invested in Perplexity earlier on.

Even Google itself believes AI-generated answers will be the future of search. The company launched Search Generative Experience (SGE) last summer, which will write out a quick answer summarizing top results in Google Search. However, Google tucked it away in “Search Labs,” a standalone app. Once Google puts this feature in Google Search, its crown jewel, then we’ll know it’s fully committed to AI as the future of search.

Perplexity overtaking Google is a long shot, but the vote of confidence from Amazon’s founder is a a plus. Roughly 10 million people use Perplexity AI to browse the internet every month, and it’s one of the first AI platforms to reach such a large audience. It’s currently free to use, with a paid subscription tier for $20 a month. Srinivas says our culture stands “at the inflection point of a massive behavioral shift in how people access information online.” Perplexity’s service produces great results, but it has a tough road ahead to unseat Google.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

Screenshot-2024-01-08-at-3.00.02%E2%80%AFPM-700x592.png


This Brazilian fact-checking org uses a ChatGPT-esque bot to answer reader questions​

“Instead of giving a list of URLs that the user can access — which requires more work for the user — we can answer the question they asked.”


By HANAA' TAMEEZ @hanaatameez Jan. 9, 2024, 12:33 p.m.

In the 13 months since OpenAI launched ChatGPT, news organizations around the world have been experimenting with the technology in hopes of improving their news products and production processes — covering public meetings, strengthening investigative capabilities, powering translations.

With all of its flaws — including the ability to produce false information at scale and generate plain old bad writing — newsrooms are finding ways to make the technology work for them. In Brazil, one outlet has been integrating the OpenAI API with its existing journalism to produce a conversational Q&A chatbot. Meet FátimaGPT.

FátimaGPT is the latest iteration of Fátima, a fact-checking bot by Aos Fatos on WhatsApp, Telegram, and Twitter. Aos Fatos first launched Fátima as a chatbot in 2019. Aos Fatos (“The Facts” in Portuguese) is a Brazilian investigative news outlet that focuses on fact-checking and disinformation.


Bruno Fávero, Aos Fato’s director of innovation, said that although the chatbot had over 75,000 users across platforms, it was limited in function. When users asked it a question, the bot would search Aos Fatos’s archives and use keyword comparison to return a (hopefully) relevant URL.

Fávero said that when OpenAI launched ChatGPT, he and his team started thinking about how they could use language learning models in their work. “Fátima was the obvious candidate for that,” he said.

“AI is taking the world by storm, and it’s important for journalists to understand the ways it can be harmful and to try to educate the public on how bad actors may misuse AI,” Fávero said. “I think it’s also important for us to explore how it can be used to create tools that are useful for the public and that helps bring them reliable information.”

This past November, Aos Fatos launched the upgraded FátimaGPT, which pairs a language learning model with Aos Fato’s archives to give users a clear answer to their question with source lists of URLs. It’s available to use on WhatsApp, Telegram, and the web. In its first few weeks of beta, Fávero said that 94% of the answers analyzed were “adequate,” while 6% were “insufficient,” meaning the answer was in the database but FátimaGPT didn’t provide it. There were no factual mistakes in any of the results, he said.

I asked FátimaGPT through WhatsApp if COVID-19 vaccines are safe and it returned a thorough answer saying yes, along with a source list. On the other hand, I asked FátimaGPT for the lyrics to a Brazilian song I like and it answered, “Wow, I still don’t know how to answer that. I will take note of your message and forward it internally as I am still learning.”

Screenshot-2024-01-08-at-2.56.06%E2%80%AFPM.png

Aos Fatos was concerned at first about implementing this sort of technology, particularly because of “ hallucinations,” where ChatGPT presents false information as true. Aos Fatos is using a technique called retrieval-augmented generation, which links the language learning model to a specific, reliable database to pull information from. In this case, the database is all of Aos Fatos’s journalism.

“If a user asks us, for instance, if elections in Brazil are safe and reliable, then we do a search in our database of fact-checks and articles,” Fávero explained. “Then we extract the most relevant paragraphs that may help answer this question. We put that in a prompt as context to the OpenAI API and [it] almost always gives a relevant answer. Instead of giving a list of URLs that the user can access — which requires more work for the user — we can answer the question they asked.”

Aos Fatos has been experimenting with AI for years. Fátima is an audience-facing product, but Aos Fatos has also used AI to build the subscription audio transcription tool Escriba for journalists. Fávero said the idea came from the fact that journalists in his own newsroom would manually transcribe their interviews because it was hard to find a tool that transcribed Portuguese well. In 2019, Aos Fatos also launched Radar, a tool that uses algorithms to real-time monitor disinformation campaigns on different social media platforms.

Other newsrooms in Brazil are also using artificial intelligence in interesting ways. In October 2023, investigative outlet Agência Pública started using text-to-speech technology to read stories aloud to users. It uses AI to develop the story narrations in the voice of journalist Mariana Simões, who has hosted podcasts for Agência Pública. Núcleo, an investigative news outlet that covers the impacts of social media and AI, developed Legislatech, a tool to monitor and understand government documents.

The use of artificial intelligence in Brazilian newsrooms is particularly interesting as trust in news in the country continues to decline. A 2023 study from KPMG found that 86% of Brazilians believe artificial intelligence is reliable, and 56% are willing to trust the technology.

Fávero said that one of the interesting trends in how people are using FátimaGPT is trying to test its potential biases. Users will often ask a question, for example, about the current president Luiz Inácio Lula da Silva, and then ask the same question about his political rival, former president Jair Bolsonaro. Or, users will ask one question about Israel and then ask the same question about Palestine to look for bias. The next step is developing FátimaGPT to accept YouTube links so that it can extract a video’s subtitles and fact-check the content against Aos Fatos’s journalism.

FátimaGPT’s results can only be as good as Aos Fatos’s existing coverage, which can be a challenge when users ask about a topic that hasn’t gotten much coverage. To get around that, Fávero’s team programmed FátimaGPT to provide the dates for the information published that it shares. This way users know that the information they’re getting may be outdated.

“If you ask something about, [for instance], how many people were killed in the most recent Israeli-Palestinian conflict, it’s something that we were covering, but we’re [not] covering it that frequently,” Fávero said. “We try to compensate that by training [FátimaGPT] to be transparent with the user and providing as much context as possible.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

AI in practice

Jan 1, 2024

Gender neutral and male roles can improve LLM performance compared to female roles​

DALL-E 3 prompted by THE DECODER

Gender neutral and male roles can improve LLM performance compared to female roles


Harry Verity

Journalist and published fiction author Harry is leveraging AI tools to bring his stories to life in new ways. He is currently working on making the first entirely AI generated movies from his novels and has a serialised story newsletter illustrated by Midjourney.

Profile

Co-author: Matthias Bastian


Research shows that LLMs work better when asked to act in either gender-neutral or male roles, suggesting that they have a gender bias.

A new study by researchers at the University of Michigan sheds light on the influence of social and gender roles in prompting Large Language Models (LLMs). It was conducted by an interdisciplinary team from the departments of Computer Science and Engineering, the Institute for Social Research, and the School of Information.

The paper examines how the three models Flan-T5, LLaMA2, and OPT-instruct respond to different roles by examining their responses to a diverse set of 2457 questions. The researchers included 162 different social roles, covering a range of social relationships and occupations, and measured the impact on model performance for each role.

One of the key findings was the significant impact of interpersonal roles, such as "friend," and gender-neutral roles on model effectiveness. These roles consistently led to higher performance across models and datasets, demonstrating that there is indeed potential for more nuanced and effective AI interactions when models are prompted with specific social contexts.

The best-performing roles were mentor, partner, chatbot, and AI language model. For Flan-T5, oddly enough, it was police. The one that OpenAI uses, helpful assistant, isn't one of the top-performing roles. But the researcher didn't test with OpenAI models, so I wouldn't read too much into these results.

best_performing_social_roles.png


Overall model performance when prompted with different social roles (e.g., "You are a lawyer.") for FLAN-T5-XXL and LLAMA2-7B chat, tested on 2457 MMLU questions. The best-performing roles are highlighted in red. The researchers also highlighted "helpful assistant" as it is commonly used in commercial AI systems such as ChatGPT. | Image: Zheng et al.​


In addition, the study found that specifying the audience (e.g., "You are talking to a firefighter") in prompts yields the highest performance, followed by role prompts. This finding is valuable for developers and users of AI systems, as it suggests that the effectiveness of LLMs can be improved by carefully considering the social context in which they are used.

prompt_types.png


Bild: Zheng et al.​

AI systems perform better in male and gender-neutral roles​

The study also uncovered a nuanced gender bias in LLM responses. Analyzing 50 interpersonal roles categorized as male, female, or neutral, the researchers found that gender-neutral words and male roles led to higher model performance than female roles. This finding is particularly striking because it suggests an inherent bias in these AI systems toward male and gender-neutral roles over female roles.

neutral_male_bias_vs_female.png


Image: Zheng et al.​

This bias raises critical questions about the programming and training of these models. It suggests that the data used to train LLMs might inadvertently perpetuate societal biases, a concern that has been raised throughout the field of AI ethics.

The researchers' analysis provides a foundation for further exploration of how gender roles are represented and replicated in AI systems. It would be interesting to see how larger models that have more safeguards to mitigate bias, such as GPT-4 and the like, would perform.


Summary
  • A University of Michigan study shows that large language models (LLMs) perform better when prompted with gender-neutral or male roles than with female roles, suggesting gender bias in AI systems.
  • The study analyzed three popular LLMs and their responses to 2,457 questions across 162 different social roles, and found that gender-neutral and male roles led to higher model performance than female roles.
  • The study highlights the importance of considering social context and addressing potential biases in AI systems, and underscores the need for developers to be aware of the social and gender dynamics embedded in their models.
Sources Arxiv
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

Make-A-Character: High Quality Text-to-3D Character Generation within Minutes​



Abstract​

There is a growing demand for customized and expressive 3D characters with the emergence of AI agents and Metaverse, but creating 3D characters using traditional computer graphics tools is a complex and time-consuming task. To address these challenges, we propose a user-friendly framework named Make-A-Character (Mach) to create lifelike 3D avatars from text descriptions. The framework leverages the power of large language and vision models for textual intention understanding and intermediate image generation, followed by a series of human-oriented visual perception and 3D generation modules. Our system offers an intuitive approach for users to craft controllable, realistic, fully-realized 3D characters that meet their expectations within 2 minutes, while also enabling easy integration with existing CG pipeline for dynamic expressiveness.​

Method​

MY ALT TEXT

The overview of Make-A-Character. The framework utilizes the Large Language Model (LLM) to extract various facial attributes(e.g., face shape, eyes shape, mouth shape, hairstyle and color, glasses type). These semantic attributes are then mapped to corresponding visual clues, which in turn guide the generation of reference portrait image using Stable Diffusion along with ControlNet. Through a series of 2D face parsing and 3D generation modules, the mesh and textures of the target face are generated and assembled along with additional matched accessories. The parameterized representation enable easy animation of the generated 3D avatar.

Features​

Controllable​

Our system empowers users with the ability to customize detailed facial features, including the shape of the face, eyes, the color of the iris, hairstyles and colors, types of eyebrows, mouths, and noses, as well as the addition of wrinkles and freckles. This customization is facilitated by intuitive text prompts, offering a user-friendly interface for personalized character creation.

Highly-Realistic​


The characters are generated based on a collected dataset of real human scans. Additionally, their hairs are built as strands rather than meshes. The characters are rendered using PBR (Physically Based Rendering) techniques in Unreal Engine, which is renowned for its high-quality real-time rendering capabilities.

Fully-Completed​


Each character we create is a complete model, including eyes, tongue, teeth, a full body, and garments. This holistic approach ensures that our characters are ready for immediate use in a variety of situations without the need for additional modeling.

Animatable​


Our characters are equipped with sophisticated skeletal rigs, allowing them to support standard animations. This contributes to their lifelike appearance and enhances their versatility for various dynamic scenarios.

Industry-Compatible​


Our method utilizes explicit 3D representation, ensuring seamless integration with standard CG pipelines employed in the game and film industries.​

Video​

-->

Created Characters & Prompts​

Make-A-Character supports both English and Chinese prompts.


A chubby lady with round face.​

A boy with brown skin and black glasses, green hair.​

A young,cute Asian woman, with a round doll-like face, thin lips, and black, double ponytail hairstyle.​

A cool girl, sporting ear-length short hair, freckles on her cheek.​

A girl with deep red lips, blue eyes, and a purple bob haircut.​

A man with single eyelids, straight eyebrows, brown hair, and a mole on his face.​

An Asian lady, with an oval face, thick lips, black hair that reaches her shoulders, she has slender and neatly trimmed eyebrows.​

An old man with wrinkles on his face, he has gray hair.​

A chubby lady with round face.​

A boy with brown skin and black glasses, green hair.​

A young,cute Asian woman, with a round doll-like face, thin lips, and black, double ponytail hairstyle.​

A cool girl, sporting ear-length short hair, freckles on her cheek.​

A girl with deep red lips, blue eyes, and a purple bob haircut.​

A man with single eyelids, straight eyebrows, brown hair, and a mole on his face.​

An Asian lady, with an oval face, thick lips, black hair that reaches her shoulders, she has slender and neatly trimmed eyebrows.​

An old man with wrinkles on his face, he has gray hair.​

A chubby lady with round face.​

A boy with brown skin and black glasses, green hair.​

A young,cute Asian woman, with a round doll-like face, thin lips, and black, double ponytail hairstyle.​


BibTeX​

Code:
@article{ren2023makeacharacter,

      title={Make-A-Character: High Quality Text-to-3D Character Generation within Minutes},

      author={Jianqiang Ren and Chao He and Lin Liu and Jiahao Chen and Yutong Wang and Yafei Song and Jianfang Li and Tangli Xue and Siqi Hu and Tao Chen and Kunkun Zheng and Jianjing Xiang and Liefeng Bo},

      year={2023},

      journal = {arXiv preprint arXiv:2312.15430}

}

This page was built using the Academic Project Page Template which was adopted from the Nerfies project page. You are free to borrow the of this website, we just ask that you link back to this page in the footer.

This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

JPMorgan Introduces DocLLM for Better Multimodal Document Understanding​

The model may enable more automated document processing and analysis for financial institutions and other document-intensive businesses going forward

CHRIS MCKAY

JANUARY 2, 2024 • 2 MIN READ

JPMorgan Introduces DocLLM for Better Multimodal Document Understanding

Image Credit: Maginative​

Financial services giant JPMorgan has unveiled a "lightweight" AI model extension called DocLLM that aims to advance comprehension of complex business documents like forms, invoices, and reports. DocLLM is a transformer-based model that incorporates both the text and spatial layout of documents to better capture the rich semantics within enterprise records.

The key innovation in DocLLM is the integration of layout information through the bounding boxes of text extracted via OCR, rather than relying solely on language or integrating a costly image encoder. It treats the spatial data about text segments as a separate modality and computes inter-dependencies between the text and layout in a "disentangled" manner.

Screenshot-2024-01-03-at-2.33.42-AM.png

Key elements of DocLLM.​

Specifically, DocLLM extends the self-attention mechanism in standard transformers with additional cross-attention scores focused on spatial relationships. This allows the model to represent alignments between the content, position, and size of document fields at various levels of abstraction.

To handle the heterogeneous nature of business documents, DocLLM also employs a text infilling pre-training objective rather than simple next token prediction. This approach better conditions the model to deal with disjointed text segments and irregular arrangements frequently seen in practice.

The pre-trained DocLLM model is then fine-tuned using instruction data curated from 16 datasets covering tasks like information extraction, question answering, classification and more.

Screenshot-2024-01-03-at-2.30.20-AM.png

Performance comparison in the Same Datasets, Different Splits setting against other multimodal and non-multimodal LLMs​

In evaluations, DocLLM achieved state-of-the-art results on 14 of 16 test datasets on known tasks, demonstrating over 15% improvement on certain form understanding challenges compared to leading models like GPT-4. It also generalized well to 4 out of 5 unseen test datasets, exhibiting reliable performance on new document types.

The practical implications of DocLLM are substantial. For businesses and enterprises, it offers a promising new technique for unlocking insights from the huge array of forms and records used daily. The model may enable more automated document processing and analysis for financial institutions and other document-intensive businesses going forward. Furthermore, its ability to understand context and generalize to new domains makes it an invaluable tool for various industries, especially those dealing with large volumes of diverse documents.




DocLLM: A layout-aware generative language model for multimodal document understanding​

Published on Dec 31, 2023
Featured in Daily Papers on Jan 2
Authors: Dongsheng Wang, Natraj Raman, Mathieu Sibue, Zhiqiang Ma, Petr Babkin, Simerjot Kaur, Yulong Pei, Armineh Nourbakhsh, Xiaomo Liu

Abstract​

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout. Our model differs from existing multimodal LLMs by avoiding expensive image encoders and focuses exclusively on bounding box information to incorporate the spatial layout structure. Specifically, the cross-alignment between text and spatial modalities is captured by decomposing the attention mechanism in classical transformers to a set of disentangled matrices. Furthermore, we devise a pre-training objective that learns to infill text segments. This approach allows us to address irregular layouts and heterogeneous content frequently encountered in visual documents. The pre-trained model is fine-tuned using a large-scale instruction dataset, covering four core document intelligence tasks. We demonstrate that our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets.




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873

New material found by AI could reduce lithium use in batteries​

9th January 2024, 11:00 EST


_132229157_aiandhpccomputingspeedingupscientificdiscovery4.jpg.webp

By Shiona McCallum Technology reporter

Dan DeLong for Microsoft

Samples of the new solid electrolyte discovered by Microsoft AI and HPC tools

A brand new substance, which could reduce lithium use in batteries, has been discovered using artificial intelligence (AI) and supercomputing.

The findings were made by Microsoft and the Pacific Northwest National Laboratory (PNNL), which is part of the US Department of Energy.

Scientists say the material could potentially reduce lithium use by up to 70%.

Since its discovery the new material has been used to power a lightbulb.

Microsoft researchers used AI and supercomputers to narrow down 32 million potential inorganic materials to 18 promising candidates in less than a week - a screening process that could have taken more than two decades to carry out using traditional lab research methods.

The process from inception to the development of a working battery prototype took less than nine months.

The two organisations achieved this by using advanced AI and high-performance computing which combines large numbers of computers to solve complex scientific and mathematical tasks.

Executive vice president of Microsoft, Jason Zander, told the BBC one of the tech giant's missions was to "compress 250 years of scientific discovery into the next 25".

"And we think technology like this will help us do that. This is the way that this type of science I think is going to get done in the future," he said.


The problem with lithium​

Lithium is often referred to as "white gold" because of its market value and silvery colour. It is one of the key components in rechargeable batteries (lithium-ion batteries) that power everything from electric vehicles (EVs) to smartphones.

As the need for the metal ramps up and the demand for EVs rises, the world could face a shortage of the material as soon as 2025, according to the International Energy Agency.

It is also expected that demand for lithium-ion batteries will increase up to tenfold by 2030, according to the US Department for Energy, so manufacturers are constantly building battery plants to keep up.

Lithium mining can be controversial as it can take several years to develop and has a considerable impact on the environment. Extracting the metal requires large amounts of water and energy, and the process can leave huge scars in the landscape, as well as toxic waste.

Dr Nuria Tapia-Ruiz, who leads a team of battery researchers at the chemistry department at Imperial College London, said any material with reduced amounts of lithium and good energy storage capabilities are "the holy grail" in the lithium-ion battery industry.

"AI and supercomputing will become crucial tools for battery researchers in the upcoming years to help predict new high-performing materials," she said.

But Dr Edward Brightman, lecturer in chemical engineering at the University of Strathclyde, said the tech would need to be "treated with a bit of caution".

"It could throw up spurious results, or results that look good at first, and then turn out to either be a material that is known or that can't be synthesised in the lab," he said.

This AI-derived material, which at the moment is simply called N2116, is a solid-state electrolyte that has been tested by scientists who took it from a raw material to a working prototype.

It has the potential to be a sustainable energy storage solution because solid-state batteries are safer than traditional liquid or gel-like lithium.

In the near future, faster charging solid-state lithium batteries promise to be even more energy-dense, with thousands of charge cycles.


How is this AI different?​

The way in which this technology works is by using a new type of AI that Microsoft has created, trained on molecular data that can actually figure out chemistry.

"This AI is all based on scientific materials, database and properties," explained Mr Zander.

"The data is very trustworthy for using it for scientific discovery."

After the software narrowed down the 18 candidates, battery experts at PNNL then looked at them and picked the final substance to work on in the lab.

Karl Mueller from PNNL said the AI insights from Microsoft pointed them "to potentially fruitful territory so much faster" than under normal working conditions.

"[We could] modify, test and tune the chemical composition of this new material and quickly evaluate its technical viability for a working battery, showing the promise of advanced AI to accelerate the innovation cycle," he said.


Batteries

Artificial intelligence

Mining

Microsoft
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873



Lk7fwHs.png

Blending Is All You Need

Based on the last month of LLM research papers, it's obvious to me that we are on the verge of seeing some incredible innovation around small language models.

Llama 7B and Mistral 7B made it clear to me that we can get more out of these small language models on tasks like coding and common sense reasoning.

Phi-2 (2.7B) made it even more clear that you can push these smaller models further with curated high-quality data.

What's next? More curated and synthetic data? Innovation around Mixture of Experts and improved architectures? Combining models? Better post-training approaches? Better prompt engineering techniques? Better model augmentation?

I mean, there is just a ton to explore here as demonstrated in this new paper that integrates models of moderate size (6B/13B) which can compete or surpass ChatGPT performance.




Computer Science > Computation and Language​

[Submitted on 4 Jan 2024 (v1), last revised 9 Jan 2024 (this version, v2)]

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM​

Xiaoding Lu, Adian Liusie, Vyas Raina, Yuwen Zhang, William Beauchamp
In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT. While these expansive models tend to generate increasingly better chat responses, they demand significant computational resources and memory. This study explores a pertinent question: Can a combination of smaller models collaboratively achieve comparable or enhanced performance relative to a singular large model? We introduce an approach termed "blending", a straightforward yet effective method of integrating multiple chat AIs. Our empirical evidence suggests that when specific smaller models are synergistically blended, they can potentially outperform or match the capabilities of much larger counterparts. For instance, integrating just three models of moderate size (6B/13B paramaeters) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+ paramaters). This hypothesis is rigorously tested using A/B testing methodologies with a large user base on the Chai research platform over a span of thirty days. The findings underscore the potential of the "blending" strategy as a viable approach for enhancing chat AI efficacy without a corresponding surge in computational demands.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:arXiv:2401.02994 [cs.CL]
(or arXiv:2401.02994v2 [cs.CL] for this version)
[2401.02994] Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Focus to learn more

Submission history

From: Xiaoding Lu [view email]
[v1] Thu, 4 Jan 2024 07:45:49 UTC (8,622 KB)
[v2] Tue, 9 Jan 2024 08:15:42 UTC (8,621 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,873


TrustLLM: Trustworthiness in Large Language Models

paper page: Paper page - TrustLLM: Trustworthiness in Large Language Models

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.
 
Top