The A.I Megathread (LLM , GPT , Development)

bnew · Sep 6, 2023

Researcher builds anti-Russia AI disinformation machine for $400

AI makes it cheap and easy to create propaganda at scale.

arstechnica.com

Researcher builds anti-Russia AI disinformation machine for $400

AI makes it cheap and easy to create propaganda at scale.

WILL KNIGHT, WIRED.COM - 8/30/2023, 9:51 AM

Enlarge

James Marshall; Getty Images
101WITH

In May, Sputnik International, a state-owned Russian media outlet, posted a series of tweets lambasting US foreign policy and attacking the Biden administration. Each prompted a curt but well-crafted rebuttal from an account called CounterCloud, sometimes including a link to a relevant news or opinion article. It generated similar responses to tweets by the Russian embassy and Chinese news outlets criticizing the US.

Russian criticism of the US is far from unusual, but CounterCloud’s material pushing back was: The tweets, the articles, and even the journalists and news sites were crafted entirely by artificial intelligence algorithms, according to the person behind the project, who goes by the name Nea Paw and says it is designed to highlight the danger of mass-produced AI disinformation. Paw did not post the CounterCloud tweets and articles publicly but provided them to WIRED and also produced a video outlining the project.

Paw claims to be a cybersecurity professional who prefers anonymity because some people may believe the project to be irresponsible. The CounterCloud campaign pushing back on Russian messaging was created using OpenAI’s text generation technology, like that behind ChatGPT, and other easily accessible AI tools for generating photographs and illustrations, Paw says, for a total cost of about $400.

Paw says the project shows that widely available generative AI tools make it much easier to create sophisticated information campaigns pushing state-backed propaganda.

“I don't think there is a silver bullet for this, much in the same way there is no silver bullet for phishing attacks, spam, or social engineering,” Paw says in an email. Mitigations are possible, such as educating users to be watchful for manipulative AI-generated content, making generative AI systems try to block misuse, or equipping browsers with AI-detection tools. “But I think none of these things are really elegant or cheap or particularly effective,” Paw says.

In recent years, disinformation researchers have warned that AI language models could be used to craft highly personalized propaganda campaigns, and to power social media accounts that interact with users in sophisticated ways.
Renee DiResta, technical research manager for the Stanford Internet Observatory, which tracks information campaigns, says the articles and journalist profiles generated as part of the CounterCloud project are fairly convincing.
“In addition to government actors, social media management agencies and mercenaries who offer influence operations services will no doubt pick up these tools and incorporate them into their workflows,” DiResta says. Getting fake content widely distributed and shared is challenging, but this can be done by paying influential users to share it, she adds.

Some evidence of AI-powered online disinformation campaigns has surfaced already. Academic researchers recently uncovered a crude, crypto-pushing botnet apparently powered by ChatGPT. The team said the discovery suggests that the AI behind the chatbot is likely already being used for more sophisticated information campaigns.

Legitimate political campaigns have also turned to using AI ahead of the 2024 US presidential election. In April, the Republican National Committee produced a video attacking Joe Biden that included fake, AI-generated images. And in June, a social media account associated with Ron Desantis included AI-generated images in a video meant to discredit Donald Trump. The Federal Election Commission has said it may limit the use of deepfakes in political ads.
Micah Musser, a researcher who has studied the disinformation potential of AI language models, expects mainstream political campaigns to try using language models to generate promotional content, fund-raising emails, or attack ads. “It's a totally shaky period right now where it's not really clear what the norms are,” he says.

A lot of AI-generated text remains fairly generic and easy to spot, Musser says. But having humans finesse AI-generated content pushing disinformation could be highly effective, and almost impossible to stop using automated filters, he says.

The CEO of OpenAI, Sam Altman, said in a Tweet last month that he is concerned that his company’s artificial intelligence could be used to create tailored, automated disinformation on a massive scale.

When OpenAI first made its text generation technology available via an API, it banned any political usage. However, this March, the company updated its policy to prohibit usage aimed at mass-producing messaging for particular demographics. A recent Washington Post article suggests that GPT does not itself block the generation of such material.

Kim Malfacini, head of product policy at OpenAI, says the company is exploring how its text-generation technology is being used for political ends. People are not yet used to assuming that content they see may be AI-generated, she says. “It’s likely that the use of AI tools across any number of industries will only grow, and society will update to that,” Malfacini says. “But at the moment I think folks are still in the process of updating.”

Since a host of similar AI tools are now widely available, including open source models that can be built on with few restrictions, voters should get wise to the use of AI in politics sooner rather than later.
This story originally appeared on wired.com.

bnew · Sep 6, 2023

https://falconllm.tii.ae/falcon-models.html

Falcon 180B

Falcon 180B is a super-powerful language model with 180 billion parameters, trained on 3.5 trillion tokens. It's currently at the top of the Hugging Face Leaderboard for pre-trained Open Large Language Models and is available for both research and commercial use..

This model performs exceptionally well in various tasks like reasoning, coding, proficiency, and knowledge tests, even beating competitors like Meta's LLaMA 2.

Among closed source models, it ranks just behind OpenAI's GPT 4, and performs on par with Google's PaLM 2 Large, which powers Bard, despite being half the size of the model.

The download of Falcon 180B is subject to our Terms & Conditions and Acceptable Use Policy

Falcon-180B Demo - a Hugging Face Space by tiiuae

Discover amazing ML apps made by the community

huggingface.co

Falcon-180B Demo

Chat with Falcon-180B-Chat, brainstorm ideas, discuss your holiday plans, and more!

This demo is powered by Falcon-180B and finetuned on a mixture of Ultrachat, Platypus and Airoboros. Falcon-180B is a state-of-the-art large language model built by the Technology Innovation Institute in Abu Dhabi. It is trained on 3.5 trillion tokens (including RefinedWeb) and available under the Falcon-180B TII License. It currently holds the

1st place on the

Open LLM leaderboard for a pretrained model.

This is only a first experimental preview: we intend to provide increasingly capable versions of Falcon in the future, based on improved datasets and RLHF/RLAIF.

Learn more about Falcon LLM: falconllm.tii.ae

️ Intended Use: this demo is intended to showcase an early finetuning of Falcon-180B, to illustrate the impact (and limitations) of finetuning on a dataset of conversations and instructions. We encourage the community to further build upon the base model, and to create even better instruct/chat versions!

Limitations: the model can and will produce factually incorrect information, hallucinating facts and actions. As it has not undergone any advanced tuning/alignment, it can produce problematic outputs, especially if prompted to do so. Finally, this demo is limited to a session length of about 1,000 words.

Blog post:

Spread Your Wings: Falcon 180B is here

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

protip:
i was able to increase the token amount to 6080 without receiving an error after input.
improved code generation results if you adjust "TOP P", i got good results using 0.5 and 0.8 depending on the prompt. test the same prompt with different settings.

bnew · Sep 7, 2023

Anthropic’s Claude AI chatbot gets a paid plan for heavy users

Claude Pro offers five times more usage than the free tier.

www.theverge.com

Anthropic’s Claude AI chatbot gets a paid plan for heavy users

Claude Pro costs $20 per month in the US or £18 per month in the UK.

By Emma Roth, a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

Sep 7, 2023, 10:55 AM EDT|

A graphic showing a robot performing multiple functions

Illustration by Alex Castro / The Verge

Anthropic, the AI company backed by Google, has launched a paid version of its Claude chatbot in the US and UK. Priced at $20 (or £18) per month, the new Claude Pro option offers priority access when the bot is busy, early access to new features, and the ability to send more messages.

The main draw is that you’ll get five times more usage with Claude Pro when compared to the free tier, which means you can send more messages in a shorter period of time. Anthropic says the typical user will get at least 100 messages every eight hours depending on Claude’s capacity. The company says it will warn you when you have 10 messages remaining, with its limits resetting every eight hours.

Anthropic’s new Claude Pro offering puts it on track to compete with OpenAI’s $20 per month ChatGPT Plus plan. Poe, a Quora-owned hub for AI chatbots, also offers a $20 per month paid plan, and Microsoft recently launched an enterprise version of Bing Chat with increased privacy for businesses.

Anthropic first launched Claude in March, which it markets as a bot that’s “easier to converse with” and “less likely to produce harmful outputs.” While the bot was initially only available within Slack or to businesses, Anthropic launched Claude 2 to users in the US and UK in July.

bnew · Sep 7, 2023

https://archive.ph/dYxVz

Running a 180B parameter LLM on a single Apple M2 Ultra | Hacker News

news.ycombinator.com

bnew · Sep 7, 2023

Open Interpreter ChatGPT Code Interpreter You Can Run LOCALLY!

GitHub - KillianLucas/open-interpreter: OpenAI's Code Interpreter in your terminal, running locally

OpenAI's Code Interpreter in your terminal, running locally - GitHub - KillianLucas/open-interpreter: OpenAI's Code Interpreter in your terminal, running locally

github.com

Let language models run code on your computer.
An open-source, locally running implementation of OpenAI's Code Interpreter.

Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.

This provides a natural-language interface to your computer's general-purpose capabilities:

Create and edit photos, videos, PDFs, etc.
Control a Chrome browser to perform research
Plot, clean, and analyze large datasets
...etc.

Note: You'll be asked to approve code before it's run.

bnew · Sep 7, 2023

Releasing Persimmon-8B

We’re open-sourcing Persimmon-8B, the most powerful fully permissively-licensed language model with <10 billion parameters.

www.adept.ai

Releasing Persimmon-8B

September 7, 2023 — Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani

We’re open-sourcing Persimmon-8B, the most powerful fully permissively-licensed language model with <10 billion parameters.

We’re excited to open-source Persimmon-8B, the best fully permissively-licensed model in the 8B class. The code and weights are here.

At Adept, we’re working towards an AI agent that can help people do anything they need to do on a computer. We’re not in the business of shipping isolated language models (LMs)—this was an early output of the model scaling program that will support our products.

Over the last year, we’ve been amazed by how smart small models are becoming, and we wanted to give the community access to an even better 8B LM to build on for any use case, with an open Apache license and publicly accessible weights. The 8B size is a sweet spot for most users without access to large-scale compute—they can be finetuned on a single GPU, run at a decent speed on modern MacBooks, and may even fit on mobile devices.

Persimmon-8B has several nice properties:

This is the most capable open-source, fully permissive model with fewer than 10 billion parameters. We are releasing it under an Apache license for maximum flexibility.
We trained it from scratch using a context size of 16K. Many LM use cases are context-bound; our model has 4 times the context size of LLaMA2 and 8 times that of GPT-3, MPT, etc.
Our base model exceeds other ~8B models and matches LLaMA2 performance despite having been trained on only 0.37x as much data as LLaMA2.
The model has 70k unused embeddings for multimodal extensions, and has sparse activations.
The inference code we’re releasing along with the model is unique—it combines the speed of C++ implementations (e.g. FasterTransformer) with the flexibility of naive Python inference.

We’re excited to see how the community takes advantage of these capabilities not present in other open source language models, and we hope this model spurs even greater innovation!

Because this is a raw model release, we have not added further finetuning, postprocessing or sampling strategies to control for toxic outputs.

A more realistic way of doing evals

Determining the quality of a language model is still as much art as science. Model quality is not an absolute metric and depends on how the language model will be used. In most use cases, we expect language models to generate text. However, a common methodology for evaluating language models doesn’t actually ask them to generate any text at all. Consider the following multiple choice question from the common HellaSwag eval set. The goal is to pick which of the four answers best continues the “question.”

A woman is outside with a bucket and a dog. The dog is running around trying to avoid a bath. She…
a) rinses the bucket off with soap and blow dries the dog’s head.
b) uses a hose to keep it from getting soapy.
c) gets the dog wet, then it runs away again.
d) gets into a bathtub with the dog.

One way to evaluate the model is to simply ask it to answer the question and then see which choice it makes – (a), (b), (c), or (d). This mimics the experience of how people actually interact with language models – they ask questions and expect answers. This is analogous to e.g. HELM.

A more common practice in ML is instead to use the implicit probabilities that the language model assigns to each choice. For option (a) above, we calculate the probability of “rinses” given the previous sentences, and then probability of “the” given the previous sentences plus “rinses,“ and so on. We then multiply all these probabilities together, giving the probability of the entire sequence for option (a). We do this for all four choices (optionally adding length normalization to account for different length sequences) and select the option with the highest sequence probability. This is a fine way to measure the intrinsic knowledge of a language model, but a poor way to understand what actually interacting with it is like.

Since we care about interacting with language models, we do all of our evals with the former technique–we directly generate answers from the model. We’re releasing the prompts we use so that others can reproduce these numbers.

Results

We compared Persimmon-8B to the current most powerful model in its size range—LLama 2—and to MPT 7B Instruct. Our instruction-fine-tuned model—Persimmon-8B-FT—is the strongest performing model on all but one of the metrics. Our base model—Persimmon-8B-Base—performs comparably to Llama 2, despite having seen only 37% as much training data.

Eval Task	MPT 7B Instruct 1-Shot	Llama 2 Base 7B 1-Shot	Persimmon-8B-Base 1-Shot	Persimmon-8B-FT 1-Shot
MMLU	27.6	36.6	36.5	41.2
Winogrande	49.1	51.1	51.4	54.6
Arc Easy	32.5	53.7	48.1	64.0
Arc Challenge	28.8	43.8	34.5	46.8
TriviaQA	33.9	36.6	24.3	17.2
HumanEval	12.8	0 / 12.21	18.9	20.7

Model Details

Persimmon-8B is a standard decoder-only transformer with several architecture modifications.

We use the squared ReLU activation function2. We use rotary positional encodings – our internal experiments found it superior to Alibi. We add layernorm to the Q and K embeddings before they enter the attention calculation.

The checkpoint we are releasing has approximately 9.3B parameters. In order to make pipelining during training more efficient, we chose to decouple the input and output embeddings. Doing this does not increase the capacity of the model–it is purely a systems optimization to avoid all-reducing the gradients for the (very large) embeddings across potentially slow communication links. In terms of inference cost, the model is equivalent to an 8B parameter model with coupled input/output embeddings.

Furthermore, in a space-constrained environment, the 70k unused embeddings (corresponding to reserved tokens) could be removed from the input/output embedding matrices. This would reduce the model size by approximately 570M parameters.

We train the model from start to finish with a sequence length of 16K on 737B tokens uniformly sampled from a much larger dataset, which is a mix of text (~75%) and code (~25%).

bnew · Sep 7, 2023

{continued}

Natively training on such long sequences throughout training is made possible by our development of an improved version of FlashAttention, (Github). We also modified the base for rotary calculations to allow for full position resolution at this longer length. This contrasts with all other open source models, which use a sequence length of at most 4096 for the majority of training. We use a vocabulary of 262k tokens, built using a unigram sentencepiece model.

We’ve included a table with important model information below:

Attribute	Value
Hidden Size	4096
Heads	64
Layers	36
Batch Size	120
Sequence Length	16384
Training Iterations	375000
Tokens Seen	737 Billion

Flexible and Fast Inference

We’re also releasing fast inference code for this model–with a short prompt, we can sample ~56 tokens per second on one 80GB A100 GPU3. While most optimized inference code is complicated and brittle, we’ve managed to make ours flexible without sacrificing speed. We can define models in PyTorch, run inference with minimal changes, and still be faster than FasterTransformer.

There are two main things that slow down traditional inference implementations:

First, both the Python runtime and CUDA kernel dispatch incur per-operation overheads.
Second, failing to fuse operations means we spend time writing to memory and then reading back again the same values; while this overhead might go unnoticed during training (which is compute bound), inference is usually bottlenecked by memory bandwidth.

The standard practice for achieving fast inference is to rewrite the entire model inference loop in C++, as in FasterTransformer, and call out to special fused kernels in CUDA. But this means that any changes to the model require painfully reimplementing every feature twice: once in Python / PyTorch in the training code and again in C++ in the inference codebase. We found this process too cumbersome and error prone to iterate quickly on the model.

We wanted a strategy that would fix both of these slowdowns without maintaining a separate C++ codebase.

To handle operator fusion, we’ve extracted one of the attention kernels4 from NVIDIA’s FasterTransformer repo. During Python inference, we simply replace the attention operation with a call to this kernel. Because our architecture modifications don’t touch the core attention operation, this highly complex kernel can remain unmodified.
To handle the per-operator overheads, we use CUDA graphs to capture and replay the forward pass. We’ve also implemented this in a way that works with tensor parallelism, which lets us easily use multiple GPUs for inference.

This strategy gives us the best of worlds—we can write model code in only one place while still doing inference faster than FasterTransformer. We really hope this accelerates the exciting applications that folks in the community can build.

This is just the first small release in a series of things we’re excited to put out this fall and winter. Enjoy!
—

Footnotes

The Llama 2 base model did not produce valid code in our eval runs, so we additionally report the value from the Llama 2 paper. ↩
In contrast to the more standard SwiGLU and GeLU activations, the squared ReLU often results in output activations consisting of 90+% zeros. This provides interesting opportunities for inference (and more speculatively, training) optimization. ↩
Note that because our vocabulary is larger than that of LLaMA and MPT, the actual inference speed in terms of characters is likely comparatively higher. ↩
The decoder_masked_multihead_attention kernel, in particular. ↩

bnew · Sep 8, 2023

Exclusive: ChatGPT traffic slips again for third month in a row

OpenAI's ChatGPT, the wildly popular artificial intelligence chatbot launched in November, saw monthly website visits decline for the third month in a row in August, though there are signs the decline is coming to an end, according to analytics firm Similarweb.

www.reuters.com

Exclusive: ChatGPT traffic slips again for third month in a row

By Anna Tong
September 7, 20232:30 PM EDT
Updated a day ago

OpenAI and ChatGPT logos are seen in this illustration taken, February 3, 2023. REUTERS/Dado Ruvic/Illustration//File Photo Acquire Licensing Rights
Sept 7 (Reuters) - OpenAI's ChatGPT, the wildly popular artificial intelligence chatbot launched in November, saw monthly website visits decline for the third month in a row in August, though there are signs the decline is coming to an end, according to analytics firm Similarweb.

Worldwide desktop and mobile website visits to the ChatGPT website decreased by 3.2% to 1.43 billion in August, following approximately 10% drops from each of the previous two months. The amount of time visitors spent on the website has also been declining monthly since March, from an average of 8.7 minutes on site to 7 minutes on site in August.

But August worldwide unique visitors ticked up to 180.5 million users from 180 million.

School coming back into session in September may help ChatGPT's traffic and usage, and some schools have begun to embrace it. U.S. ChatGPT traffic in August rose slightly, in concert with American schools being back in session.

"Students seeking homework help appears to be part of the story: the percentage of younger users of the website dropped over the summer and is now starting to bounce back," said David F. Carr of Similarweb, who regularly tracks ChatGPT and its competitors.

ChatGPT set off a frenzied use of generative AI in daily tasks from editing to coding and reached 100 million monthly active users in January, two months after its launch. Generative AI technology uses past data to create new content, for instance to write essays or poems.

Before Meta’s Threads launch, it was the fastest-growing consumer application ever, and is now one of the top 30 websites in the world.

A few ChatGPT competitors, including Google's (GOOGL.O) Bard chatbot, have been launched this year. Microsoft's search engine Bing also provides a chatbot powered by OpenAI for free.

OpenAI also released the ChatGPT app on the iOS system in May, which could sap some traffic from its website. ChatGPT is free to use but also provides a premium subscription for $20 a month.

Besides ChatGPT, OpenAI makes money by selling access to its AI models for developers and enterprises directly and through a partnership with Microsoft, which invested over $10 billion into the company.

bnew · Sep 8, 2023

https://archive.ph/f39lb

bnew · Sep 8, 2023

https://archive.ph/AVny3

Release 2.6.0 · huggingface/transformers.js

What's new? 🤯 14 new architectures In this release, we've added a ton of new architectures: BLOOM, MPT, BeiT, CamemBERT, CodeLlama, GPT NeoX, GPT-J, HerBERT, mBART, mBART-50, OPT, ResNet, WavLM, an...

github.com

bnew · Sep 8, 2023

WizardLM/WizardMath-7B-V1.0 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Sep 8, 2023

https://archive.ph/phRaH

GitHub - huggingface/open_asr_leaderboard

Contribute to huggingface/open_asr_leaderboard development by creating an account on GitHub.

github.com

bnew · Sep 8, 2023

https://archive.ph/eZIRo

teknium/OpenHermes-13B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Sep 8, 2023

https://archive.ph/Q8DRd

jondurbin/spicyboros-7b-2.2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Sep 8, 2023

https://archive.ph/SjMFx

https://archive.ph/vvIdU

The A.I Megathread (LLM , GPT , Development)

Veteran

Researcher builds anti-Russia AI disinformation machine for $400​

AI makes it cheap and easy to create propaganda at scale.​

Veteran

Falcon 180B​

Falcon-180B Demo​

Veteran

Anthropic’s Claude AI chatbot gets a paid plan for heavy users​

Claude Pro costs $20 per month in the US or £18 per month in the UK.​

Veteran

Veteran

Veteran

Releasing Persimmon-8B​

A more realistic way of doing evals​

Results​

Model Details​

Veteran

Flexible and Fast Inference​

Footnotes​

Veteran

Exclusive: ChatGPT traffic slips again for third month in a row​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Researcher builds anti-Russia AI disinformation machine for $400

AI makes it cheap and easy to create propaganda at scale.

Falcon 180B

Falcon-180B Demo

Anthropic’s Claude AI chatbot gets a paid plan for heavy users

Claude Pro costs $20 per month in the US or £18 per month in the UK.

Releasing Persimmon-8B

A more realistic way of doing evals

Results

Model Details

Flexible and Fast Inference

Footnotes

Exclusive: ChatGPT traffic slips again for third month in a row