Former Google CEO Eric Schmidt Bets AI Will Shake Up Scientific Research

bnew · Nov 1, 2023

https://www.msn.com/en-us/money/other/former-google-ceo-eric-schmidt-bets-ai-will-shake-up-scientific-research/ar-AA1jcZFQ

Former Google CEO Eric Schmidt Bets AI Will Shake Up Scientific Research

Story by Jackie Davalos, Nate Lanxon and David Warren • 1h

(Bloomberg) -- Eric Schmidt is funding a nonprofit that’s focused on building an artificial intelligence-powered assistant for the laboratory, with the lofty goal of overhauling the scientific research process, according to interviews with the former Google CEO and officials at the new venture.

The nonprofit, Future House, plans to develop AI tools that can analyze and summarize research papers as well as respond to scientific questions using large language models — the same technology that supports popular AI chatbots. But Future House also intends to go a step further.

The “AI scientist,” as Future House refers to it, will one day be able to sift through thousands of scientific papers and independently compose hypotheses at greater speed and scale than humans, Chief Executive Officer Sam Rodriques said on the latest episode of the Bloomberg Originals series AI IRL, his most extensive comments to date on the company.

A growing number of businesses and investors are focusing on AI’s potential applications in science, including uncovering new medicines and therapies. While Future House aims to make breakthroughs of its own, it believes the scientific process itself can be transformed by having AI generate a hypothesis, conduct experiments and reach conclusions — even though some existing AI tools have been prone to errors and bias.

Rodriques acknowledged the risks of AI being applied in science. "It's not just inaccuracy that you need to worry about,” he said. There are also concerns that “people can use them to come up with weapons and things like that.” Future House will "have an obligation" to make sure there's safeguards in place,” he added.

Eric Schmidt© Bloomberg
In an interview, Schmidt said early-stage scientific research “is not moving fast enough today.” Schmidt helped shape the idea behind Future House and was inspired by his time at Xerox’s Palo Alto Research Center, which developed ethernet, laser printing and other innovations.

“It was a place where you got these people in their late 20s and early 30s, gave them independence and all the resources they needed, and they would invent things at a pace that you didn't get anywhere else,” Schmidt said. “What I really want is to create new environments like what PARC used to be, where outstanding young researchers can pursue their best ideas.”

Schmidt has an estimated net worth of $24.5 billion, according to the Bloomberg Billionaires Index. He’s funneled some of that fortune into philanthropic efforts like Schmidt Futures, an initiative that funds science and technology entrepreneurs. In recent months, he’s emerged as an influential voice on AI policy in Washington.

Rodriques, a biotechnology inventor who studied at the Massachusetts Institute of Technology, said Schmidt will fund Future House for its first five years. He estimated that the non-profit will spend about $20 million by the end of 2024. After that, “it will depend on how we grow and what we need,” he said, adding that a substantial portion of that cash will go to hiring talent and setting up what’s called a “wet” laboratory, a space designed to test chemicals and other biological matter. While Schmidt is providing most of the upfront capital, Future House is also in talks with other philanthropic backers, Rodriques said.

“The key thing about Future House is that we are getting together this biology talent and this AI talent in a way that you don't get in other places,” Schmidt said.

One of the first hires is Andrew White, the nonprofit’s head of science, who was most recently an associate professor of chemical engineering at the University of Rochester. “I think most scientists probably read five papers a week. Imagine what's going to happen when you have systems that can process all 10,000 papers that are coming out every day,” White said. “In some fields, the limiting factor is not the equipment. It's not really the cost. It's the ability of humans to come up with the next experiment.”

Future House will start with biology but its system will eventually be applicable to other scientific fields, White said.

With his financial backing, Schmidt believes Future House will be able to prioritize research rather than racing to make money. “I think getting the incentives right is especially important right now, when there’s a very high expectation that progress in AI will lead to products in the short term, which is leading a lot of the big AI research centers to focus very much on commercialization over research,” Schmidt said.

Watch the full episode of AI IRL now or catch up on all previous episodes .

Geek Nasty · Nov 2, 2023

We‘re already seeing AI will literally lie if it can’t get an answer. Can’t trust it for research.

bnew · Nov 2, 2023

Geek Nasty said:
We‘re already seeing AI will literally lie if it can’t get an answer. Can’t trust it for research.

used AI to explain why they can't "lie"(unless instructed/trained to..)

AI (Artificial Intelligence) and LLMs (Large Language Models) are computational models that learn from vast amounts of data to make predictions or generate responses. Unlike humans, who have the ability to deliberately conceal the truth or intentionally mislead others through lying, AI/LLMs cannot lie because they do not have beliefs, intentions, or personal experiences. They don't possess a sense of deceitfulness, manipulation, or any moral compass that would allow them to fabricate false information. Instead, their output is always based on the training data provided to them. If you ask an AI/LLM a question outside its knowledge domain, it will either say "I don't know" or provide a response based on guesswork or patterns learned during training, which may not be accurate or true. In summary, AI/LLMs cannot lie as they lack subjective experiences and the cognitive capacity to deceive; however, they might occasionally produce incorrect or misleading results due to limitations inherent in their learning algorithms. On the other hand, AI/LLMs sometimes exhibit behavior resembling hallucinations when presented with inputs beyond their trained dataset. This phenomenon arises due to the model's extrapolation abilities where it generates new outputs by combining previously learned features, even if those combinations aren’t present in real-world scenarios. The resultant output might seem plausible but isn't necessarily grounded in reality since it's purely driven by mathematical calculations rather than empirical evidence. Therefore, while AI/LLMs excel at various tasks such as language translation, summarization, and recommendation systems, we must still approach their output with caution and verify its accuracy whenever possible.

https://huggingfaceh4-zephyr-chat.hf.space/r/SdL0Rc4

Geek Nasty · Nov 2, 2023

bnew said:
used AI to explain why they can't "lie"(unless instructed/trained to..)

https://huggingfaceh4-zephyr-chat.hf.space/r/SdL0Rc4

Maybe “lie” is the wrong word since it implies deceit, but if you ask AI to solve a problem and it fills in the gaps with garbage data or analysis that’s the problem.

bnew · Nov 2, 2023

Geek Nasty said:
Maybe “lie” is the wrong word since it implies deceit, but if you ask AI to solve a problem and it fills in the gaps with garbage data or analysis that’s the problem.

yeah it hallucinates. hallucinations can be reduced with better prompts, proper grammer and guided instructions on how to respond.

hallucinations aren't all bad, I find in many cases they give creative responses.

Vandelay · Nov 3, 2023

Geek Nasty said:
Maybe “lie” is the wrong word since it implies deceit, but if you ask AI to solve a problem and it fills in the gaps with garbage data or analysis that’s the problem.

Can't the logic strings be pulled to see how it generates results?

This would be helpful to have those backups so they could be like digital citations when creating documents.

bnew · Nov 3, 2023

Charles Foster Qane said:
Can't the logic strings be pulled to see how it generates results?

This would be helpful to have those backups so they could be like digital citations when creating documents.

We still don't really understand what large language models are

The world has happily embraced large language models such as ChatGPT, but even researchers working in AI don't fully understand the systems they work on, finds Alex Wilkins

www.newscientist.com

We still don't really understand what large language models are

The world has happily embraced large language models such as ChatGPT, but even researchers working in AI don't fully understand the systems they work on, finds Alex Wilkins

By Alex Wilkins

4 October 2023

Portland, OR, USA - Jan 17, 2023: Webpages of ChatGPT, OpenAI's chatbot, and Google are seen on smartphones. A new wave of chatbots like ChatGPT use AI that can reinvent the traditional search engine.; Shutterstock ID 2250721589; purchase_order: -; job: -; client: -; other: -

Tada Images/Shutterstock

SILICON Valley’s feverish embrace of large language models (LLMs) shows no sign of letting up. Google is integrating its chatbot Bard into every one of its services, while OpenAI is imbuing its own offering, ChatGPT, with new senses, such as the ability to “see” and “speak”, envisaging a new kind of personal assistant. But deep mysteries remain about how these tools function: what is really going on behind their shiny interfaces, which tasks are they truly good at and how might they fail? Should we really be betting the house on technology with so many unknowns?

There are still large …
debates about what, exactly, these complex programs are doing. In February, sci-fi author Ted Chiang wrote a viral piece suggesting LLMs like ChatGPT could be compared to compression algorithms, which allow images or music to be squeezed into a JPEG or MP3 to save space. Except here, Chiang said, the LLMs were effectively compressing the entire internet, like a “blurry JPEG of the web”. The analogy received a mixed reception from researchers: some praised it for its insight, and others accused it of oversimplification.

It turns out there is a deep connection between LLMs and compression, as shown by a recent paper from a team at Google Deepmind, but you would have to be immersed in academia to know it. These tools, the researchers showed, do compression in the same way as JPEGs and MP3s, as Chiang suggested – they are shrinking the data into something more compact. But they also showed compression algorithms can work the other way, too, as LLMs, predicting the next word or number in a sequence. For instance, if you give the JPEG algorithm half of an image, it can predict what pixel would come next better than random noise.

This work was met with surprise even from AI researchers, for some because they hadn’t come across the idea, and for others because they thought it was so obvious. This may seem like an obscure academic warren that I have fallen down, but it highlights an important problem.

Many researchers working in AI don’t fully understand the systems they work on, for reasons of both fundamental mystery and for how relatively young the field is. If researchers at a top AI lab are still unearthing new insights, then should we be trusting these models with so much responsibility so quickly?

The nature of LLMs and how their actions are interpreted is only part of the mystery. While OpenAI will happily claim that GPT-4 “exhibits human-level performance on various professional and academic benchmarks”, it is still unclear exactly how the system performs with tasks it hasn’t seen before.

On their surface, as most AI scientists will tell you, LLMs are next-word prediction machines. By just trying to find the next most likely word in a sequence, they appear to display the power to reason like a human. But recent work from researchers at Princeton University suggests many cases of what appears to be reasoning are much less exciting and more like what these models were designed to do: next-word prediction.

For instance, when they asked GPT-4 to multiply a number by 1.8 and add 32, it got the answer right about half the time, but when those numbers are tweaked even slightly, it never gets the answer correct. That is because the first formula is the conversion of centigrade to Fahrenheit. GPT-4 can answer this correctly because it has seen that pattern many times, but when it comes to abstracting and applying this logic to similar problems that it has never seen, something even school kids are able to do, it fails.

For this reason, researchers warn that we should be cautious about using LLMs for problems they are unlikely to have seen before. But the millions of people that use tools like ChatGPT every day aren’t aware of this imbalance in its problem-solving abilities, and why should they be? There are no warnings about this on OpenAI’s website, which just states that “ChatGPT may produce inaccurate information about people, places, or facts”.

This also hints that OpenAI’s suggestion of “human-level performance” on benchmarks might be less impressive than it first seems. If these benchmarks are made mainly of high-probability events, then the LLMs’ general problem-solving abilities might be worse than they first appear. The Princeton authors suggest we might need to rethink how we assess LLMs and design tests that take into account how these models actually work.

Of course, these tools are still useful – many tedious tasks are high-probability, frequently occurring problems. But if we do integrate LLMs into every aspect of our lives, then it would serve us, and the tools’ creators, well to spend more time thinking about how they work and might fail.

Vandelay · Nov 11, 2023

bnew said:
We still don't really understand what large language models are

The world has happily embraced large language models such as ChatGPT, but even researchers working in AI don't fully understand the systems they work on, finds Alex Wilkins

www.newscientist.com

We still don't really understand what large language models are
The world has happily embraced large language models such as ChatGPT, but even researchers working in AI don't fully understand the systems they work on, finds Alex Wilkins

By Alex Wilkins

4 October 2023

Tada Images/Shutterstock

SILICON Valley’s feverish embrace of large language models (LLMs) shows no sign of letting up. Google is integrating its chatbot Bard into every one of its services, while OpenAI is imbuing its own offering, ChatGPT, with new senses, such as the ability to “see” and “speak”, envisaging a new kind of personal assistant. But deep mysteries remain about how these tools function: what is really going on behind their shiny interfaces, which tasks are they truly good at and how might they fail? Should we really be betting the house on technology with so many unknowns?

There are still large …
debates about what, exactly, these complex programs are doing. In February, sci-fi author Ted Chiang wrote a viral piece suggesting LLMs like ChatGPT could be compared to compression algorithms, which allow images or music to be squeezed into a JPEG or MP3 to save space. Except here, Chiang said, the LLMs were effectively compressing the entire internet, like a “blurry JPEG of the web”. The analogy received a mixed reception from researchers: some praised it for its insight, and others accused it of oversimplification.

It turns out there is a deep connection between LLMs and compression, as shown by a recent paper from a team at Google Deepmind, but you would have to be immersed in academia to know it. These tools, the researchers showed, do compression in the same way as JPEGs and MP3s, as Chiang suggested – they are shrinking the data into something more compact. But they also showed compression algorithms can work the other way, too, as LLMs, predicting the next word or number in a sequence. For instance, if you give the JPEG algorithm half of an image, it can predict what pixel would come next better than random noise.

This work was met with surprise even from AI researchers, for some because they hadn’t come across the idea, and for others because they thought it was so obvious. This may seem like an obscure academic warren that I have fallen down, but it highlights an important problem.

Many researchers working in AI don’t fully understand the systems they work on, for reasons of both fundamental mystery and for how relatively young the field is. If researchers at a top AI lab are still unearthing new insights, then should we be trusting these models with so much responsibility so quickly?

The nature of LLMs and how their actions are interpreted is only part of the mystery. While OpenAI will happily claim that GPT-4 “exhibits human-level performance on various professional and academic benchmarks”, it is still unclear exactly how the system performs with tasks it hasn’t seen before.

On their surface, as most AI scientists will tell you, LLMs are next-word prediction machines. By just trying to find the next most likely word in a sequence, they appear to display the power to reason like a human. But recent work from researchers at Princeton University suggests many cases of what appears to be reasoning are much less exciting and more like what these models were designed to do: next-word prediction.

For instance, when they asked GPT-4 to multiply a number by 1.8 and add 32, it got the answer right about half the time, but when those numbers are tweaked even slightly, it never gets the answer correct. That is because the first formula is the conversion of centigrade to Fahrenheit. GPT-4 can answer this correctly because it has seen that pattern many times, but when it comes to abstracting and applying this logic to similar problems that it has never seen, something even school kids are able to do, it fails.

For this reason, researchers warn that we should be cautious about using LLMs for problems they are unlikely to have seen before. But the millions of people that use tools like ChatGPT every day aren’t aware of this imbalance in its problem-solving abilities, and why should they be? There are no warnings about this on OpenAI’s website, which just states that “ChatGPT may produce inaccurate information about people, places, or facts”.

This also hints that OpenAI’s suggestion of “human-level performance” on benchmarks might be less impressive than it first seems. If these benchmarks are made mainly of high-probability events, then the LLMs’ general problem-solving abilities might be worse than they first appear. The Princeton authors suggest we might need to rethink how we assess LLMs and design tests that take into account how these models actually work.

Of course, these tools are still useful – many tedious tasks are high-probability, frequently occurring problems. But if we do integrate LLMs into every aspect of our lives, then it would serve us, and the tools’ creators, well to spend more time thinking about how they work and might fail.

I'm late, but this is silly. They literally can't pull the calculation and the data set used? Or whatever algorithm is used to compile? I just don't get why we're so eager...I get why we are; but it's infinitely dumb to flip the switch on something you don't even understand.

bnew · Nov 11, 2023

Charles Foster Qane said:
I'm late, but this is silly. They literally can't pull the calculation and the data set used? Or whatever algorithm is used to compile? I just don't get why we're so eager...I get why we are; but it's infinitely dumb to flip the switch on something you don't even understand.

there is no logic string or thought process to retrieve, the benefits of LLM have been immense so far and is poised to be even greater. people will continue using them even though it's innards aren't fully understood.

Professor Emeritus · Nov 12, 2023

Charles Foster Qane said:
Can't the logic strings be pulled to see how it generates results?

This would be helpful to have those backups so they could be like digital citations when creating documents.

Charles Foster Qane said:
I'm late, but this is silly. They literally can't pull the calculation and the data set used? Or whatever algorithm is used to compile? I just don't get why we're so eager...I get why we are; but it's infinitely dumb to flip the switch on something you don't even understand.

@bnew is right, you really can't with complex neural networks. They're not programmed via some formulaic "logic string", the complexity is far beyond that. Even if you were able to eventually decode the structure they create, the complexity of the process is so beyond normal comprehension that you can't just "see" how it generates results.

The following thread is a good primer. To pull out one easy-to-understand fact, in ChatGPT-3, every word is represented by a string of 12,288 different numbers. EVERY fukking word. And that's not even the most complex part of the process. Imagine trying to read code where you have to look at the 12,288 different dimensions of each individual word.

https://www.thecoli.com/threads/a-jargon-free-explanation-of-how-ai-large-language-models-work.988748/

Vandelay · Nov 12, 2023

Rhakim said:
@bnew is right, you really can't with complex neural networks. They're not programmed via some formulaic "logic string", the complexity is far beyond that. Even if you were able to eventually decode the structure they create, the complexity of the process is so beyond normal comprehension that you can't just "see" how it generates results.

The following thread is a good primer. To pull out one easy-to-understand fact, in ChatGPT-3, every word is represented by a string of 12,288 different numbers. EVERY fukking word. And that's not even the most complex part of the process. Imagine trying to read code where you have to look at the 12,288 different dimensions of each individual word.

https://www.thecoli.com/threads/a-jargon-free-explanation-of-how-ai-large-language-models-work.988748/

I'm not saying he's wrong. My concern is LLM's will eventually be used as a shorthand reference to create media based on real and factual documents.

We know it "hallucinates", which I originally thought were logic or programming errors. I don't think it's feasible to fact check every piece of media we consume, so I don't think it's ethical to put out a piece of software that we don't understand how to proof the logic used to create the document. It's like in 6th grade when we had to show our work. You didn't get credit or partial credit if you couldn't show how you got the answer.

Like it's not only unethical in my opinion, but it's honestly bizarre that are companies are rushing to get to market and then furthermore compound the confusion by not even knowing that the media being created is created by an LLM.

Again, I don't want the software banned, but I can't shake the feeling we're unleashing a Moloch Trap on the world.

Professor Emeritus · Nov 12, 2023

Charles Foster Qane said:
I'm not saying he's wrong. My concern is LLM's will eventually be used as a shorthand reference to create media based on real and factual documents.

We know it "hallucinates", which I originally thought were logic or programming errors. I don't think it's feasible to fact check every piece of media we consume, so I don't think it's ethical to put out a piece of software that we don't understand how to proof the logic used to create the document. It's like in 6th grade when we had to show our work. You didn't get credit or partial credit if you couldn't show how you got the answer.

Like it's not only unethical in my opinion, but it's honestly bizarre that are companies are rushing to get to market and then furthermore compound the confusion by not even knowing that the media being created is created by an LLM.

Again, I don't want the software banned, but I can't shake the feeling we're unleashing a Moloch Trap on the world.

Yes, I completely agree.

bnew · Nov 12, 2023

Charles Foster Qane said:
I'm not saying he's wrong. My concern is LLM's will eventually be used as a shorthand reference to create media based on real and factual documents.

We know it "hallucinates", which I originally thought were logic or programming errors. I don't think it's feasible to fact check every piece of media we consume, so I don't think it's ethical to put out a piece of software that we don't understand how to proof the logic used to create the document. It's like in 6th grade when we had to show our work. You didn't get credit or partial credit if you couldn't show how you got the answer.

Like it's not only unethical in my opinion, but it's honestly bizarre that are companies are rushing to get to market and then furthermore compound the confusion by not even knowing that the media being created is created by an LLM.

Again, I don't want the software banned, but I can't shake the feeling we're unleashing a Moloch Trap on the world.

the vast majority of people are using software they don't understand. many people don't even know what a browser is even though they might use it everyday. you'd be surprised at how many youtube viewers think believe youtube to be some sanctioned platform for information akin to television and trust the content on there without question. I really don't see LLM that differently when taking all that into account. I see alot more companies inserting disclaimers before users start chatting with chatbots tho. some companies incorporating it as customer service will gladly leave users in the dark about it.

also many methods are still being researched and developed to reduce hallucinations.. i personally thing they should be harnessed somehow because it does produce interesting output.

bnew said:
Chain-of-Verification Reduces Hallucination in Large Language Models

Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method...

arxiv.org

Chain-of-Verification Reduces Hallucination in Large Language Models
Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: arXiv:2309.11495 [cs.CL]
(or arXiv:2309.11495v2 [cs.CL] for this version)
[2309.11495] Chain-of-Verification Reduces Hallucination in Large Language Models
Focus to learn more

Submission history
From: Jason Weston [view email]
[v1] Wed, 20 Sep 2023 1755 UTC (7,663 KB)
[v2] Mon, 25 Sep 2023 15:25:49 UTC (7,665 KB)

https://arxiv.org/pdf/2309.11495.pdf

Meta shows how to reduce hallucinations in ChatGPT & Co with prompt engineering

When ChatGPT & Co. have to check their answers themselves, they make fewer mistakes, a new study by Meta shows.

the-decoder.com

AI research

Oct 12, 2023

Meta shows how to reduce hallucinations in ChatGPT & Co with prompt engineering
Midjourney prompted by THE DECODER:

Maximilian Schreiner

Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail

When ChatGPT & Co. have to check their answers themselves, they make fewer mistakes, according to a new study by Meta.
ChatGPT and other language models repeatedly reproduce incorrect information - even when they have learned the correct information. There are several approaches to reducing hallucination. Researchers at Meta AI now present Chain-of-Verification (CoVe), a prompt-based method that significantly reduces this problem.

New method relies on self-verification of the language model
With CoVe, the chatbot first responds to a prompt such as "Name some politicians who were born in New York." Based on this output, which often already contains errors, the language model then generates questions to verify the statements, such as "Where was Donald Trump born?"

CoVe relies on separately prompted verification questions. | Image: Meta AI

These "verification questions" are then executed as a new prompt, independent of the first input, to prevent the possible adoption of incorrect information from the first output. The language model then verifies the first input against the separately collected facts. All testing was done withLlama 65B.

Chain-of-verification significantly reduces hallucinations in language models
The team shows that answers to individual questions contain significantly fewer errors, allowing CoVe to significantly improve the final output to a prompt. For list-based questions, such as the politician example, CoVe can more than double accuracy, significantly reducing the error rate.

For more complex question-and-answer scenarios, the method still yields a 23 percent improvement, and even for long-form content, CoVe increases factual accuracy by 28 percent. However, with longer content, the team also needs to check the verification answers for inconsistencies.

In their tests, the Meta team can also show that instruction tuning and chain-of-thought prompting do not reduce hallucinations, so Llama 65B with CoVe beats the newer, instruction-tuned modelLlama 2. In longer content, the model with CoVe also outperforms ChatGPT and PerplexityAI, which can even collect external facts for its generations. CoVe works entirely with knowledge stored in the model.

In the future, however, the method could be improved by external knowledge, e.g. by allowing the language model to answer verification questions by accessing an external database.

Summary

Meta AI has developed a new method called Chain-of-Verification (CoVe) that significantly reduces misinformation from language models such as ChatGPT.

CoVe works by having the chatbot generate verification questions based on its initial response, and then execute them independently of the original input to prevent the acquisition of false information. The language model then compares the original input with the separately collected facts.

The method has been shown to more than double accuracy for list-based questions and improves factual accuracy by 28 %, even for long content. In the future, CoVe could be improved by integrating external knowledge, such as accessing an external database to answer verification questions.

LLM's in the wild....

bnew said:
Microsoft AI inserted a distasteful poll into a news report about a woman’s death

AI or not, Microsoft’s news automators can’t read the room.

www.theverge.com

Microsoft AI inserted a distasteful poll into a news report about a woman’s death
/

The Guardian says the ‘Insights from AI’ poll showed up next to a story about a young woman’s death syndicated on MSN, asking readers to vote on how they thought she died.
By Wes Davis, a weekend editor who covers the latest in tech and entertainment. He has written news, reviews, and more as a tech journalist since 2020.

Oct 31, 2023, 12:24 PM EDT|25 Comments / 25 New

Illustration: The Verge

More than three years after Microsoft gutted its news divisions and replaced their work with AI and algorithmic automation, the content generated by its systems continues to contain grave errors that human involvement could, or should, have stopped. Today, The Guardian accused the company of damaging its reputation with a poll labeled “Insights from AI” that appeared in Microsoft Start next to a Guardian story about a woman’s death, asking readers to vote on how she died.

The Guardian wrote that though the poll was removed, the damage had already been done. The poll asked readers to vote on whether a woman took her own life, was murdered, or died by accident. Five-day-old comments on the story indicate readers were upset, and some clearly believe the story’s authors were responsible.

We asked Microsoft via email whether the poll was AI-generated and how it was missed by its moderation, and Microsoft general manager Kit Thambiratnam replied:

The Verge obtained a screenshot of the poll from The Guardian.

A screenshot sent by The Guardian shows the poll, which is clearly labeled “Insights from AI.” Screenshot: The Guardian

In August, a seemingly AI-generated Microsoft Start travel guide recommended visiting the Ottawa Food Bank in Ottawa, Canada, “on an empty stomach.” Microsoft senior director Jeff Jones claimed the story wasn’t made with generative AI but “through a combination of algorithmic techniques with human review.”

Related

Microsoft says listing the Ottawa Food Bank as a tourist destination wasn’t the result of ‘unsupervised AI’

The Guardian says that Anna Bateson, Guardian Media Group’s chief executive, wrote in a letter to Microsoft president Brad Smith that the “clearly inappropriate” AI-generated poll had caused “significant reputational damage” to both the outlet and its journalists. She added that it outlined “the important role that a strong copyright framework plays” in giving journalists the ability to determine how their work is presented. She asked that Microsoft make assurances that it will seek the outlet’s approval before using “experimental AI technology on or alongside” its journalism and that Microsoft will always make it clear when it’s used AI to do so.

The Guardian provided The Verge with a copy of the letter.

letter_to_brad_smith_from_anna_bateson_31_10_23.pdf

Update October 31st, 2023, 12:40PM ET: Embedded The Guardian’s letter to Microsoft.

Update October 31st, 2023, 6:35PM ET: Added a statement from Microsoft.

Correction October 31st, 2023, 6:35PM ET: A previous version of this article stated that the poll was tagged as “Insights by AI.” In fact, the tag read, “Insights from AI.” We regret the error.

Vandelay · Nov 12, 2023

bnew said:
the vast majority of people are using software they don't understand. many people don't even know what a browser is even though they might use it everyday. you'd be surprised at how many youtube viewers think believe youtube to be some sanctioned platform for information akin to television and trust the content on there without question. I really don't see LLM that differently when taking all that into account. I see alot more companies inserting disclaimers before users start chatting with chatbots tho. some companies incorporating it as customer service will gladly leave users in the dark about it.

also many methods are still being researched and developed to reduce hallucinations.. i personally thing they should be harnessed somehow because it does produce interesting output.

hallucinations in the wild....

This is a false equivalency, reductive, and actually reinforces my point. The vast majority of software users are using Microsoft Office, Adobe suite, and the like and is often or can be peer reviewed by people at the organization it's being used for or peer reviewed by people who have a functional knowledge of the media being created. LLM's we can't understand the logic, can't independently verify if it is authentic, and its being widely distributed on platforms who don't have any accountability to anyone other themselves if false or inflammatory information is consumed en masse.

A large amount of LLM is used to create/recreate real, factual, or inspired by media, to create wildly fantastical media, or to create purposely mis and disinformation for a wide variety of reasons; be it for comedy or for malevolence. That's extremely problematic to me.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2309.11495 [cs.CL]
	(or arXiv:2309.11495v2 [cs.CL] for this version)
	[2309.11495] Chain-of-Verification Reduces Hallucination in Large Language Models Focus to learn more

Former Google CEO Eric Schmidt Bets AI Will Shake Up Scientific Research

Veteran