The A.I Megathread (LLM , GPT , Development)

Complexion · Mar 7, 2023

Michael's Black Son said:
If they don’t get the fúck outta here with this a Dollar Tree “Baby Be Mine” sounding ass shít with that zombie cover art.

Song sounds like some rejected Japanese City Pop with that weak Stevie Wonder harmonica during the break.

Sorry this ain’t even remotely close to MJ and real fans would know within 10 seconds. With that said, the fukkery potential for AI is real and is now. Scary times.

Of course. The OG was written by Temperton for Thriller but MJ chose Baby Be Mine instead so it was recorded by The Manhattans:

Considering this was done with a rudimentary AI and text to speech I'd b willing to bet that redone with more processing and an actual singer to articulate with the MJ AI draped over the top it would become a lot harder to tell real from fake. Spoke about the implications a while back:

Would You Listen to a 1:1 Clone of Your Favorite Dead Artist?

Inspired the whole FN Meka digital culture vulture thing, dead singers holograms and the future I was inspired to write an article. At two thousand words its probably a bit much to cut and paste on The Coli but I'll drop some cliffs because I want to know what you think about the implications of...

www.thecoli.com

bnew · Mar 7, 2023

bnew said:
Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Text-to-speech model can preserve speaker’s emotional tone and acoustic environment.

arstechnica.com

VALL-E

bnew · Mar 7, 2023

https://archive.is/2b2Wf

bnew · Mar 7, 2023

https://archive.is/snix7

Adeptus Astartes · Mar 7, 2023

The Dark Age of Technology is upon us, brothers.

fukk around and wind up dependent on Men of Iron, brehs.

bnew · Mar 7, 2023

https://archive.is/x0LP0

bnew · Mar 7, 2023

came across an opinion piece that used an A.I generated image instead of commissioning art from a cartoonist.

Photo: Roxanne Cooper/MidJourney

Trump has entered his fat Elvis stage: MSNBC host

Filling in for Lawrence O'Donnell on Monday, Washington Post columnist Jonathan Capehart discussed Donald Trump's speech at CPAC over the weekend that pledged, "I am your retribution.""The crowd cheered, even if he wasn't playing to a full house," Capehart mocked of the small crowd size at the...

www.rawstory.com

https://archive.is/dLIAn

bnew · Mar 7, 2023

https://www.vice.com/en/article/xgwqgw/facebooks-powerful-large-language-model-leaks-online-*****-llama

Facebook's Powerful Large Language Model Leaks Online

The leaked language model was posted to *****. The model was previously only given to approved researchers, government organizations, and members of civil society.

bit.ly

[/U]

Facebook's Powerful Large Language Model Leaks Online

The leaked language model was posted to *****. The model was previously only given to approved researchers, government organizations, and members of civil society.

By Joseph Cox
March 7, 2023, 12:08pm

IMAGE: MARCOPAKO

Facebook’s large language model, which is usually only available to approved researchers, government officials, or members of civil society, has now leaked online for anyone to download. The leaked language model was shared on *****, where a member uploaded a torrent file for Facebook’s tool, known as LLaMa (Large Language Model Meta AI), last week.

This marks the first time a major tech firm's proprietary AI model has leaked to the public. To date, firms like Google, Microsoft, and OpenAI have kept their newest models private, only accessible via consumer interfaces or an API, ostensibly to control instances of misuse. ***** members claim to be running LLaMa on their own machines, but the exact implications of this leak are not yet clear.

In a statement to Motherboard, Meta did not deny the LLaMa leak, and stood by its approach of sharing the models among researchers.

“It’s Meta's goal to share state-of-the-art AI models with members of the research community to help us evaluate and improve those models. LLaMA was shared for research purposes, consistent with how we have shared previous large language models. While the model is not accessible to all, and some have tried to circumvent the approval process, we believe the current release strategy allows us to balance responsibility and openness,” a Meta spokesperson wrote in an email.

Do you know anything else about the LLaMa leak? Are you using it for any projects? We'd love to hear from you. Using a non-work phone or computer, you can contact Joseph Cox securely on Signal on +44 20 8133 5190, Wickr on josephcox, or email joseph.cox@vice.com.

Like other AI models including OpenAI's GPT-3, LLaMa is built on a massive collection of pieces of words, or “tokens.” From here, LLaMa can then take an input of words, and predict the next word to recursively generate more text, Meta explains in a blog post from February. LLaMa has multiple versions of different sizes, with LLaMa 65B and LLaMa 33B being trained on 1.4 trillion tokens. According to the LLaMA model card, the model was trained on datasets scraped from Wikipedia, books, academic papers from ArXiv, GitHub, Stack Exchange, and other sites.

In that same February blog post, Meta says that it is releasing LLaMa under a noncommercial license focused on research use cases to “maintain integrity and prevent misuse.”

“Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world,” the post reads. Those protections have now been circumvented with the public release of LLaMa.

“We believe that the entire AI community—academic researchers, civil society, policymakers, and industry—must work together to develop clear guidelines around responsible AI in general and responsible large language models in particular. We look forward to seeing what the community can learn—and eventually build—using LLaMA,” Meta’s blog post adds.

Meanwhile, Meta appears to be filing takedown requests of the model online to control its spread.

Clement Delangue, CEO of open source AI firm Hugging Face, posted a staff update from GitHub regarding a user's LLaMa repository. "Company Meta Platforms, Inc has requested a takedown of this published model characterizing it as an unauthorized distribution of Meta Properties that constitutes a copyright infringement or improper/unauthorized use," the notice said.

Delangue cautioned users against uploading LLaMa weights to the internet. The flagged user's GitHub repository is currently offline.

bnew · Mar 7, 2023

219GB

bnew · Mar 7, 2023

GitHub Users Want to Sue Microsoft For Training an AI Tool With Their Code

“Copilot” was trained using billions of lines of open-source code hosted on sites like Github. The people who wrote the code are not happy.

www.vice.com

GitHub Users Want to Sue Microsoft For Training an AI Tool With Their Code

“Copilot” was trained using billions of lines of open-source code hosted on sites like Github. The people who wrote the code are not happy.

Janus Rose
By Janus Rose
NEW YORK, US

October 18, 2022, 2:02pm

BLOOMBERG / GETTY IMAGES

Open-source coders are investigating a potential class-action lawsuit against Microsoft after the company used their publicly-available code to train its latest AI tool.

On a website launched to spearhead an investigation of the company, programmer and lawyer Matthew Butterick writes that he has assembled a team of class-action litigators to lead a suit opposing the tool, called GitHub Copilot.

Microsoft, which bought the collaborative coding platform GitHub in 2018, was met with suspicion from open-source communities when it launched Copilot back in June. The tool is an extension for Microsoft’s Visual Studio coding environment that uses prediction algorithms to auto-complete lines of code. This is done using an AI model called Codex, which was created and trained by OpenAI using data scraped from code repositories on the open web.
Microsoft has stated that the tool was “trained on tens of millions of public repositories” of code, including those on GitHub, and that it “believe{S} that is an instance of transformative fair use.”

Obviously, some open-source coders disagree.

“Like Neo plugged into the Matrix, or a cow on a farm, Copilot wants to convert us into nothing more than producers of a resource to be extracted,” Butterick wrote on his website. “Even the cows get food & shelter out of the deal. Copilot contributes nothing to our individual projects. And nothing to open source broadly.”

Some programmers have even noticed that Copilot seems to copy their code in its resulting outputs. On Twitter, open-source users have documented examples of the software spitting out lines of code that are strikingly similar to the ones in their own repositories.

https://archive.is/mJaKO

GitHub has stated that the training data taken from public repositories “is not intended to be included verbatim in Codex outputs,” and claims that “the vast majority of output (>99%) does not match training data,” according to the company’s internal analysis.

Microsoft essentially puts the legal onus on the end user to ensure that code Copilot spits out doesn't violate any intellectual property laws, but Butterick writes that this is merely a smokescreen and GitHub Copilot in practice acts as a "selfish" interface to open-source communities that hijacks their expertise while offering nothing in return. As the Joseph Saveri Law Firm—the firm Butterick is working with on the investigation—put it, "It appears Microsoft is profiting from others' work by disregarding the conditions of the underlying open-source licenses and other legal requirements."

Microsoft and GitHub could not be reached for comment.

While open-source code is generally free to use and adapt, open source software licenses require anyone who utilizes the code to credit its original source. Naturally, this becomes practically impossible when you are scraping billions of lines of code to train an AI model—and hugely problematic when the resulting product is being sold by a massive corporation like Microsoft. As Butterick writes, "How can Copilot users comply with the license if they don’t even know it exists?"

The controversy is yet another chapter in an ongoing debate over the ethics of training AI using artwork, music, and other data scraped from the open web without permission from its creators. Some artists have begun publicly criticizing image-generating AI like DALL-E and Midjourney, which charge users for access to powerful algorithms that were trained on their original work—oftentimes gaining the ability to produce new works that mimic their style, usually with specific instructions to copy a particular artist.

In the past, human-created works that build or adapt previous works have been A-OK, and are labeled “fair use” or “transformative” under U.S. copyright law. But as Butterick notes on his website, that principle has never been tested when it comes to works created by AI systems that are trained on other works collected en-masse from the web.

Butterick seems intent on finding out, and is encouraging potential plaintiffs to contact his legal team in preparation for a potential class-action suit opposing Copilot.

“We needn’t delve into Microsoft’s very checkered history with open source to see Copilot for what it is: a parasite,” he writes on the website. “The legality of Copilot must be tested before the damage to open source becomes irreparable.”[/S]

bnew · Mar 8, 2023

4K version

https://www.artstation.com/artwork/b5QzLd

bnew · Mar 8, 2023

DuckDuckGo’s new Wikipedia summary bot: “We fully expect it to make mistakes” [Updated]

DuckAssist provides an AI-powered Wikipedia summary, hoping to avoid hallucinations.

arstechnica.com

Wikipedia + AI = truth? DuckDuckGo hopes so with new answerbot

DuckAssist provides an AI-powered Wikipedia summary, hoping to avoid hallucinations.

BENJ EDWARDS - 3/8/2023, 2:50 PM

Enlarge / An AI-generated image of a cyborg duck.

Not to be left out of the rush to integrate generative AI into search, on Wednesday DuckDuckGo announced DuckAssist, an AI-powered factual summary service powered by technology from Anthropic and OpenAI. It is available for free today as a wide beta test for users of DuckDuckGo’s browser extensions and browsing apps. Being powered by an AI model, the company admits that DuckAssist might make stuff up but hopes it will happen rarely.

Here's how it works: If a DuckDuckGo user searches a question that can be answered by Wikipedia, DuckAssist may appear and use AI natural language technology to generate a brief summary of what it finds in Wikipedia, with source links listed below. The summary appears above DuckDuckGo's regular search results in a special box.

The company positions DuckAssist as a new form of "Instant Answer"—a feature that prevents users from having to dig through web search results to find quick information on topics like news, maps, and weather. Instead, the search engine presents the Instant Answer results above the usual list of websites.

Enlarge / A demonstration screenshot of DuckAssist in action.

DuckDuckGo does not say which large language model (LLM) or models it uses to generate DuckAssist, although some form of OpenAI API seems likely. Ars Technica has reached out to DuckDuckGo representatives for clarification. But DuckDuckGo CEO Gabriel Weinberg explains how it utilizes sourcing in a company blog post:

DuckAssist answers questions by scanning a specific set of sources—for now that’s usually Wikipedia, and occasionally related sites like Britannica—using DuckDuckGo’s active indexing. Because we’re using natural language technology from OpenAI and Anthropic to summarize what we find in Wikipedia, these answers should be more directly responsive to your actual question than traditional search results or other Instant Answers.

Since DuckDuckGo's main selling point is privacy, the company says that DuckAssist is "anonymous" and emphasizes that it does not share search or browsing history with anyone. "We also keep your search and browsing history anonymous to our search content partners," Weinberg writes, "in this case, OpenAI and Anthropic, used for summarizing the Wikipedia sentences we identify."

If DuckDuckGo is using OpenAI's GPT-3 or ChatGPT API, one might worry that the site could potentially send each user's query to OpenAI every time it gets invoked. But reading between the lines, it appears that only the Wikipedia article (or excerpt of one) gets sent to OpenAI for summarization, not the user's search itself. We have reached out to DuckDuckGo for clarification on this point as well.

DuckDuckGo calls DuckAssist "the first in a series of generative AI-assisted features we hope to roll out in the coming months." If the launch goes well—and nobody breaks it with adversarial prompts—DuckDuckGo plans to roll out the feature to all search users "in the coming weeks."

DuckDuckGo: Risk of hallucinations “greatly diminished”

https://cdn.arstechnica.net/wp-content/uploads/2023/03/DuckAssist-Hot-Yoga.mp4

{demonstration video}

As we've previously covered on Ars, LLMs have a tendency to produce convincing erroneous results, which AI researchers call "hallucinations" as a term of art in the AI field. Hallucinations can be hard to spot unless you know the material being referenced, and they come about partially because GPT-style LLMs from OpenAI do not distinguish between fact and fiction in their datasets. Additionally, the models can make false inferences based on data that is otherwise accurate.

On this point, DuckDuckGo hopes to avoid hallucinations by leaning heavily on Wikipedia as a source: "by asking DuckAssist to only summarize information from Wikipedia and related sources," Weinberg writes, "the probability that it will “hallucinate”—that is, just make something up—is greatly diminished."

While relying on a quality source of information may reduce errors from false information in the AI's dataset, it may not reduce false inferences. And DuckDuckGo puts the burden of fact-checking on the user, providing a source link below the AI-generated result that can be used to examine its accuracy. But it won't be perfect, and CEO Weinberg admits it: "Nonetheless, DuckAssist won’t generate accurate answers all of the time. We fully expect it to make mistakes."

As more firms deploy LLM technology that can easily misinform, it may take some time and widespread use before companies and customers decide what level of hallucination is tolerable in an AI-powered product that is designed to factually inform people.

Complexion · Mar 9, 2023

What if an AI could interpret your imagination, turning images in your mind's eye into reality? While that sounds like a detail in a cyberpunk novel, researchers have now accomplished exactly this, according to a recently-published paper.

Researchers found that they could reconstruct high-resolution and highly accurate images from brain activity by using the popular Stable Diffusion image generation model, as outlined in a paper published in December. The authors wrote that unlike previous studies, they didn’t need to train or fine-tune the AI models to create these images.

Researchers Use AI to Generate Images Based on People's Brain Activity

Researchers found that they could reconstruct high-resolution images from brain activity.

www.vice.com

1) Close your eyes and imagine a RED STAR. 2) Open this thread.

Not everyones inner worlds are the same as your own as this experiment will show. There is a very interesting place we can go once we've established where most people sit on the spectrum of mental perception so lets get it.

www.thecoli.com

bnew · Mar 10, 2023

https://archive.is/KN8sA

Prismer

A Vision-Language Model with Multi-Modal Experts

shikun.io

bnew · Mar 10, 2023

https://archive.is/shqog

https://archive.is/P79QA

The A.I Megathread (LLM , GPT , Development)

ʇdᴉɹɔsǝɥʇdᴉlɟ

Veteran

Veteran

Veteran

Loyal servant of the God-Brehmperor

Veteran

Veteran

Veteran

Facebook's Powerful Large Language Model Leaks Online​

Veteran

Veteran

GitHub Users Want to Sue Microsoft For Training an AI Tool With Their Code​

Veteran

Veteran

Wikipedia + AI = truth? DuckDuckGo hopes so with new answerbot​

DuckAssist provides an AI-powered Wikipedia summary, hoping to avoid hallucinations.​

DuckDuckGo: Risk of hallucinations “greatly diminished”​

ʇdᴉɹɔsǝɥʇdᴉlɟ

Veteran

Veteran

Facebook's Powerful Large Language Model Leaks Online

GitHub Users Want to Sue Microsoft For Training an AI Tool With Their Code

Wikipedia + AI = truth? DuckDuckGo hopes so with new answerbot

DuckAssist provides an AI-powered Wikipedia summary, hoping to avoid hallucinations.

DuckDuckGo: Risk of hallucinations “greatly diminished”