Meta’s new AI lets people make chatbots. They’re using it for sex.
From X-rated chats to cancer research, “open-source” models are challenging tech giants’ control over the AI revolution — to the promise or peril of society
By
Pranshu Verma
and
Will Oremus
June 26, 2023 at 6:00 a.m. EDT
Allie is an 18-year old with long brown hair who boasts “tons of sexual experience.” Because she “lives for attention,” she’ll “share details of her escapades” with anyone for free.
But Allie is fake, an
artificial intelligence chatbot created for sexual play — which sometimes carries out graphic rape and abuse fantasies.
Tech is not your friend. We are. Sign up for The Tech Friend newsletter.
While firms like OpenAI, Microsoft and Google rigorously train their AI models to avoid a host of taboos, including overly intimate conversations, Allie was built using open-source technology — code that’s freely available to the public and has no such restrictions. Based on a model created by Meta, called LLaMA, Allie is part of a rising tide of specialized AI products anyone can build, from writing tools to chatbots to data analysis applications.
Advocates see open-source AI as a way around corporate control, a boon to entrepreneurs, academics, artists and activists who can experiment freely with transformative technology.
“The overall argument for open-source is that it accelerates innovation in AI,” said Robert Nishihara, CEO and co-founder of the start-up Anyscale, which helps companies run open-source AI models.
A curious person’s guide to artificial intelligence
Anyscale’s clients use AI models to discover new pharmaceuticals, reduce the use of pesticides in farming, and identify fraudulent goods sold online, he said. Those applications would be pricier and more difficult, if not impossible, if they relied on the handful of products offered by the largest AI firms.
Yet that same freedom could also be
exploited by bad actors. Open-source models have been used to create
artificial child pornography using images of real children as source material. Critics worry it could also enable fraud, cyber hacking and sophisticated propaganda campaigns.
Earlier this month, a pair of U.S. senators, Richard Blumenthal (D-Conn.) and Josh Hawley (R-Mo.) sent a
letter to Meta CEO Mark Zuckerberg warning that the release of LLaMA might lead to “its misuse in spam, fraud, malware, privacy violations, harassment, and other wrongdoing and harms.” They asked what steps Meta was taking to prevent such abuse.
Allie’s creator, who spoke on the condition of anonymity for fear of harming his professional reputation, said commercial chatbots such as
Replika and ChatGPT are “heavily censored” and can’t offer the type of sexual conversations he desires. With open-source alternatives, many based on Meta’s LLaMA model, the man said he can build his own, uninhibited conversation partners.
“It’s rare to have the opportunity to experiment with ‘state of the art’ in any field,” he said in an interview.
Allie’s creator argued that open-source technology benefits society by allowing people to build products that cater to their preferences without corporate guardrails.
“I think it’s good to have a safe outlet to explore,” he said. “Can’t really think of anything safer than a text-based role-play against a computer, with no humans actually involved.”
On YouTube, influencers offer tutorials on how to build “uncensored” chatbots. Some are based on a modified version of LLaMA, called Alpaca AI, which Stanford University researchers released in March, only to remove it a week later over concerns of cost and “
the inadequacies of our content filters.”
Nisha Deo, a spokeswoman for Meta, said the particular model referenced in the YouTube videos, called GPT-4 x Alpaca, “was obtained and made public outside of our approval process.” Representatives from Stanford did not return a request for comment.
AI-generated child sex images spawn new nightmare for the web
Open-source AI models, and the creative applications that build on them, are often published on Hugging Face, a platform for sharing and discussing AI and data science projects.
During a Thursday House science committee hearing, Clem Delangue, Hugging Face’s CEO, urged Congress to consider legislation supporting and incentivizing open-source models, which he argued are “extremely aligned with American values.”
In an interview after the hearing, Delangue acknowledged that open-source tools can be abused. He noted a model intentionally trained on toxic content
, GPT-*****, that Hugging Face had removed. But he said he believes open-source approaches allow for both greater innovation and more transparency and inclusivity than corporate-controlled models.
“I would argue that actually most of the harm today is done by black boxes,” Delangue said, referring to AI systems whose inner workings are opaque, “rather than open-source.”
Hugging Face’s rules don’t prohibit AI projects that produce sexually explicit outputs. But they do prohibit sexual content that involves minors, or that is “used or created for harassment, bullying, or without explicit consent of the people represented.” Earlier this month, the New York-based company published an
update to its content policies, emphasizing “consent” as a “core value” guiding how people can use the platform.
As Google and OpenAI have grown
more secretive about their most powerful AI models, Meta has emerged as a surprising corporate champion of open-source AI. In February it
released LLaMA, a language model that’s less powerful than GPT-4, but more customizable and cheaper to run. Meta initially withheld key parts of the model’s code and planned to limit access to authorized researchers. But by early March those parts, known as the model’s “weights,” had
leaked onto public forums, making LLaMA freely accessible to all.
“Open source is a positive force to advance technology,” Meta’s Deo said. “That’s why we shared LLaMA with members of the research community to help us evaluate, make improvements and iterate together.”
Since then, LLaMA has become perhaps the most popular open-source model for technologists looking to develop their own AI applications, Nishihara said. But it’s not the only one. In April, the software firm Databricks released an open-source model called Dolly 2.0. And last month, a team based in Abu Dhabi released an open-source model called Falcon that rivals LLaMA in performance.
Marzyeh Ghassemi, an assistant professor of computer science at MIT, said she’s an advocate for open-source language models, but with limits.
Ghassemi said it’s important to make the architecture behind powerful chatbots public, because that allows people to scrutinize how they’re built. For example, if a medical chatbot was created on open-source technology, she said, researchers could see if the data it’s trained on incorporated sensitive patient information, something that would not be possible on chatbots using closed software.
But she acknowledges this openness comes with risk. If people can easily modify language models, they can quickly create chatbots and image makers that churn out disinformation, hate speech and inappropriate material of high quality.
Ghassemi said there should be regulations governing who can modify these products, such as a certifying or credentialing process.
“Like we license people to be able to use a car,” she said, “we need to think about similar framings [for people] … to actually create, improve, audit, edit these open-trained language models.”
Some leaders at companies like Google, which keeps its chatbot Bard under lock and key, see open-source software as an existential threat to their business, because the large language models that are available to the public are becoming nearly as proficient as theirs.
“We aren’t positioned to win this [AI] arms race and neither is OpenAI,” a Google engineer
wrote in a memo posted by the tech site Semianalysis in May. “I’m talking, of course, about open source. Plainly put, they are lapping us … While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly.”
The debate over whether AI will destroy us is dividing Silicon Valley
Nathan Benaich, a general partner at Air Street Capital, a London-based venture investing firm focused on AI, noted that many of the tech industry’s greatest advances over the decades have been made possible by open-source technologies — including today’s AI language models.
“If there’s only a few companies” building the most powerful AI models, “they’re only going to be targeting the biggest-use cases,” Benaich said, adding that the diversity of inquiry is an overall boon for society.
Gary Marcus, a cognitive scientist who testified to Congress on AI regulation in May, countered that accelerating AI innovation might not be a good thing, considering the risks the technology could pose to society.
“We don’t open-source nuclear weapons,” Marcus said. “Current AI is still pretty limited, but things might change.”