bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640

I learned to make a lip-syncing deepfake in just a few hours (and you can, too)​

Zero coding experience required​



By James Vincent, a senior reporter who has covered AI, robotics, and more for eight years at The Verge.Sep 9, 2020, 10:38 AM EDT

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.
VRG_ILLO_2727_002.jpg


Artist: William Joel

How easy is it to make a deepfake, really? Over the past few years, there’s been a steady stream of new methods and algorithms that deliver more and more convincing AI-generated fakes. You can even now do basic face-swaps in a handful of apps. But what does it take to turn random code you found online into a genuine deepfake? I can now say from personal experience, you really need just two things: time and patience.Despite writing about deepfakes for years, I’ve only ever made them using prepackaged apps that did the work for me. But when I saw an apparently straightforward method for creating quick lip-sync deepfakes in no time at all, I knew I had to try it for myself.


The basic mechanism is tantalizingly simple. All you need is a video of your subject and an audio clip you want them to follow. Mash those two things together using code and, hey presto, you have a deepfake. (You can tell I don’t have much of a technical background, right?) The end result is videos like this one of the queen singing Queen:





Or of a bunch of movie characters singing that international hymn, Smash Mouth’s “All Star”:





Or of Trump miming along with this Irish classic:


Finding the algorithms​


Now, these video aren’t nefarious deepfakes designed to undermine democracy and bring about the infopocalypse. (Who needs deepfakes for that when normal editing does the job just as well?) They’re not even that convincing, at least not without some extra time and effort. What they are is dumb and fun — two qualities I value highly when committing to waste my time write an informative and engaging article for my employer.As James Kelleher, the Irish designer who created the Queen deepfake, noted on Twitter, the method he used to make the videos was shared online by some AI researchers. The paper in question describing their method (called Wav2Lip) was posted a few weeks ago, along with a public demo for anyone to try. The demo was originally freely accessible, but you now have to register to use it. K R Prajwal of IIIT Hyderabad, one of the authors of the work, told The Verge this was to dissuade malicious uses, though he admitted that registration wouldn’t “deter a serious offender who is well-versed with programming.”

“We definitely acknowledge the concern of people being able to use these tools freely, and thus, we strongly suggest the users of the code and website to clearly present the videos as synthetic,” said Prajwal. He and his fellow researchers note that the program can be used for many beneficial purposes, too, like animation and dubbing video into new languages. Prajwal adds that they hope that making the code available will “encourage fruitful research on systems that can effectively combat misuse.”

Trying (and failing) with the online demo​


I originally tried using this online demo to make a deepfake. I found a video of my target (Apple CEO Tim Cook) and some audio for him to mime to (I chose Jim Carrey for some reason). I downloaded the video footage using Quicktime’s screen record function and the audio using a handy app called Piezo. Then I got both files and plugged them into the site and waited. And waited. And eventually, nothing happened.For some reason, the demo didn’t like my clips. I tried making new ones and reducing their resolution, but it didn’t make a difference. This, it turns out, would be a motif in my deepfaking experience: random roadblocks would pop up that I just didn’t have the technical expertise to analyze. Eventually, I gave up and pinged Kelleher for help. He suggested I rename my files to remove any spaces. I did so and for some reason this worked. I now had a clip of Tim Cook miming along to Jim Carrey’s screen tests for Lemony Snicket’s A Series of Unfortunate Events. It was terrible — really just incredibly shoddy in terms of both verisimilitude and humor — but a personal achievement all the same.

Google Colab: the site of my many battles with the Wav2Lip algorithm.
Google Colab: the site of my many battles with the Wav2Lip algorithm. Image: James Vincent

Moving to Colab​

To try to improve on these results, I wanted to run the algorithms more directly. For this I turned to the authors’ Github, where they’d uploaded the underlying code. I would be using Google Colab to run it: the coding equivalent of Google Docs, which allows you to execute machine learning projects in the cloud. Again, it was the original authors who had done all the work by laying out the code in easy steps, but that didn’t stop me from walking into setback after setback like Sideshow Bob tackling a parking lot full of rakes.



My progress was akin to Sideshow Bob tackling a parking lot full of rakesWhy couldn’t I authorize Colab to access my Google Drive? (Because I was logged into two different Google accounts.) Why couldn’t the Colab project find the weights for the neural network in my Drive folder? (Because I’d downloaded the Wav2Lip model rather than the Wav2Lip + GAN version.) Why wasn’t the audio file I uploaded being identified by the program? (Because I’d misspelled “aduoi” in the file name.) And so on and so forth.


Happily, many of my problems were solved by this YouTube tutorial, which alerted me to some of the subtler mistakes I’d made. These included creating two separate folders for the inputs and the model, labeled Wav2Lip and Wav2lip respectively. (Note the different capitalization on “lip” — that’s what tripped me up.) After watching the video a few times and spending hours troubleshooting things, I finally had a working model. Honestly, I could have wept, in part at my own apparent incompetence.

The final results​



A few experiments later, I’d learned some of quirks of the program (like its difficulty dealing with faces that aren’t straight on) and decided to create my deepfake pièce de résistance: Elon Musk lip-syncing to Tim Curry’s “space” speech from Command & Conquer: Red Alert 3. You can see the results for yourself below. And sure, it’s only a small contribution to the ongoing erasure of the boundaries between reality and fiction, but at least it’s mine:


okay this one worked out a little better - Elon Musk doing Tim Curry's space speech from command & conquer pic.twitter.com/vscq9wAKRU


— James Vincent (@jjvincent) September 9, 2020

What I did learn from this experience? Well, that making deepfakes is genuinely accessible, but it’s not necessarily easy. Although these algorithms have been around for years and can be used by anyone willing to put in a few hours’ work, it’s still true that simply editing video clips using traditional methods is faster and produces more convincing results, if your aim is to spread misinformation at least.On the other hand, what impressed me was how quickly this technology spreads. This particular lip-syncing algorithm, Wav2Lip, was created by an international team of researchers affiliated with universities in India and the UK. They shared their work online at the end of August, and it was then picked up by Twitter and AI newsletters (I saw it in a well-known one called Import AI). The researchers made the code accessible and even created a public demo, and in a matter of weeks, people around the world had started experimenting with it, creating their own deepfakes for fun and, in my case, content. Search YouTube for “Wav2Lip” and you’ll find tutorials, demos, and plenty more example fakes.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640

4 Charts That Show Why AI Progress Is Unlikely to Slow Down


Illustration by Katie Kalupson for TIME
BY WILL HENSHALL

AUGUST 2, 2023 4:50 PM EDT
In the last ten years, AI systems have developed at rapid speed. From the breakthrough of besting a legendary player at the complex game Go in 2016, AI is now able to recognize images and speech better than humans, and pass tests including business school exams and Amazon coding interview questions.
Last week, during a U.S. Senate Judiciary Committee hearing about regulating AI, Senator Richard Blumenthal of Connecticut described the reaction of his constituents to recent advances in AI. “The word that has been used repeatedly is scary.”

The Subcommittee on Privacy, Technology, and the Law overseeing the meeting heard testimonies from three expert witnesses, who stressed the pace of progress in AI. One of those witnesses, Dario Amodei, CEO of prominent AI company Anthropic, said that “the single most important thing to understand about AI is how fast it is moving.”

WVetdWI.png


It’s often thought that scientific and technological progress is fundamentally unpredictable, and is driven by flashes of insight that are clearer in hindsight. But progress in the capabilities of AI systems is predictably driven by progress in three inputs—compute, data, and algorithms. Much of the progress of the last 70 years has been a result of researchers training their AI systems using greater computational processing power, often referred to as “compute”, feeding the systems more data, or coming up with algorithmic hacks that effectively decrease the amount of compute or data needed to get the same results. Understanding how these three factors have driven AI progress in the past is key to understanding why most people working in AI don’t expect progress to slow down any time soon.
Read more: The AI Arms Race Is Changing Everything

Compute​

The first artificial neural network, Perceptron Mark I, was developed in 1957 and could learn to tell whether a card was marked on the left side or the right. It had 1,000 artificial neurons, and training it required around 700,000 operations. More than 70 years later, OpenAI released the large language model GPT-4. Training GPT-4 required an estimated 21 septillion operations.

Increasing computation allows AI systems to ingest greater amounts of data, meaning the system has more examples to learn from. More computation also allows the system to model the relationship between the variables in the data in greater detail, meaning it can draw more accurate and nuanced conclusions from the examples it is shown.

Since 1965, Moore’s law—the observation that the number of transistors in an integrated circuit doubles about every two years—has meant the price of compute has been steadily decreasing. While this did mean that the amount of compute used to train AI systems increased, researchers were more focused on developing new techniques for building AI systems rather than focusing on how much compute was used to train those systems, according to Jaime Sevilla, director of Epoch, a research organization.

This changed around 2010, says Sevilla. “People realized that if you were to train bigger models, you will actually not get diminishing returns,” which was the commonly held view at the time.

Since then, developers have been spending increasingly large amounts of money to train larger scale models. Training AI systems requires expensive specialized chips. AI developers either build their own computing infrastructure, or pay cloud computing providers for access to theirs. Sam Altman, CEO of OpenAI, has said that GPT-4 cost over $100 million to train. This increased spending, combined with the continued decreases in the cost of the increases in compute resulting from Moore’s Law, has led to AI models being trained on huge amounts of compute.
OpenAI and Anthropic, two of the leading AI companies, have each raised billions from investors to pay for the compute they use to train AI systems, and each has partnerships with tech giants that have deep pockets—OpenAI with Microsoft and Anthropic with Google.

NfSxdHq.png

Data​

AI systems work by building models of the relationships between variables in their training data—whether it’s how likely the word “home” is to appear next to the word “run,” or patterns in how gene sequence relates to protein folding, the process by which a protein takes its 3D form, which then defines its function.

In general, a larger number of data points means that AI systems have more information with which to build an accurate model of the relationship between the variables in the data, which improves performance. For example, a language model that is fed more text will have a greater number of examples of sentences in which the “run” follows “home”—in sentences that describe baseball games or emphatic success, this sequence of words is more likely.

The original research paper about Perceptron Mark I says that it was trained on just six data points. By comparison, LlaMa, a large language model developed by researchers at Meta and released in 2023, was trained on around one billion data points—a more than 160-million fold increase from Perceptron Mark 1. In the case of LlaMa, the data points was text collected from a range of sources, including 67% from Common Crawl data (Common Crawl is a non-profit that scrapes the internet and makes the data collected freely available), 4.5% from GitHub (an internet service used by software developers), and 4.5% from Wikipedia.

aS12nt3.png

Algorithms​

Algorithms—sets of rules or instructions that define a sequence of operations to be carried out— determine how exactly AI systems use computational horsepower to model the relationships between variables in the data they are given. In addition to simply training AI systems on greater amounts of data using increasing amounts of compute, AI developers have been finding ways to get more from less. Research from Epoch found that “every nine months, the introduction of better algorithms contributes the equivalent of a doubling of computation budgets.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640
{continued}

rgsWgMq.png


The next phase of AI progress​

According to Sevilla, the amount of compute that AI developers use to train their systems is likely to continue increasing at its current accelerated rate for a while, with companies increasing the amount of money they spend on each AI system they train, and with increased efficiency as the price of compute continues to decrease steadily. Sevilla predicts that this will continue until at some point it is no longer worth it to keep spending more money, when increasing the amount of compute only slightly improves performance. After that, the amount of compute used will continue to increase, but at a slower rate solely due to the cost of compute decreasing as a result of Moore’s law.

The data that feeds into modern AI systems, such as LlaMa, is scraped from the internet. Historically, the factor limiting how much data is fed into AI systems has been having enough compute to process that data. But, the recent explosion in the amount of data used to train AI systems has outpaced the production of new text data on the internet has led researchers at Epoch to predict that AI developers will run out of high-quality language data by 2026.

Those developing AI systems tend to be less concerned about this issue. Appearing on the Lunar Society podcast in March, Ilya Sutskever, chief scientist at OpenAI, said that “the data situation is still quite good. There's still lots to go.” Appearing on the Hard Fork podcast in July, Dario Amodei estimated that “there’s maybe a 10% chance that this scaling gets interrupted by inability to gather enough data.”

Sevilla is also confident that a dearth of data won’t prevent further AI improvements—for example by finding ways to use low-quality language data—because unlike compute, lack of data hasn’t been a bottleneck to AI progress before. He expects there to be lots of low hanging fruit in terms of innovation that AI developers will likely discover to address this problem.

Algorithmic progress, Sevilla says, is likely to continue to act as an augmenter of how much compute and data is used to train AI systems. So far, most improvements have come from using compute more efficiently. Epoch found that more than three quarters of algorithmic progress in the past has been used to make up for shortfalls in compute. If in future, as data becomes a bottleneck for progress on AI training, more of the algorithmic progress may be focused on making up for shortfalls in data.

Putting the three pieces together, experts including Sevilla expect AI progress to continue at breakneck speed for at least the next few years. Compute will continue to increase as companies spend more money and the underlying technology becomes cheaper. The remaining useful data on the internet will be used to train AI models, and researchers will continue to find ways to train and run AI systems which make more efficient use of compute and data. The continuation of these decadal trends is why experts think AI will continue to become more capable.

This has many experts worried. Speaking at the Senate Committee hearing, Amodei said that, if progress continues at the same rate, a wide range of people could be able to access scientific know-how that even experts today do not have within the next two to three years by using AI systems. This could increase the number of people who can “wreak havoc,” he said. “In particular, I am concerned that AI systems could be misused on a grand scale in the domains of cybersecurity, nuclear technology, chemistry, and especially biology.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640

Generative AI:​


Clifford T Smyth

7 min read
1 day ago

Tools for Accessing Human Cultural Wealth, Not Artificial Intelligence​

Generative “Artificial Intelligence” has recently seen an unprecedented surge, inspiring much speculation regarding its implications and applications. The emergence of Large Language Models (LLMs), especially, has raised eyebrows due to their unexpected capabilities, even among the engineers who birthed them.

As a software developer, I find the discovery of such emergent capabilities surprising. Conventional wisdom dictates that the developer must first envision and then implement every functionality. So, what accounts for these unexpected surprises?

The answer lies in a novel proposition: generative AI is not AI per se. Instead, it serves as a tool to tap into human intelligence coded into the n-dimensional matrix of memetic data, representing the collective wealth of human language, culture, and knowledge.

This key differentiation bears a few notable implications:

1: Generative AI will never evolve into superintelligence.
Although it can process information faster and access a broader knowledge base than a human, it fundamentally lacks superior reasoning or inference ability.

2: Generative tools automate human thought. They amplify human capabilities but do not introduce any new skills not already present in the human repertoire.

However, this does not imply that they cannot derive new conclusions from existing data. Instead, it suggests that a perceptive human can infer the same conclusions from the same data.

1*dTqMs_nGpy6ADrtrEV0eXw.jpeg

To further elucidate this conjecture, consider a couple of thought experiments:

Imagine an infinitely detailed simulation in the form of a colossal choose-your-own-adventure book.

This “book” contains countless pages depicting vivid scenes, sounds, and sensations. It operates at an astonishing speed, flipping a billion pages per second, with each subsequent page chosen based on your physiological state, nerve impulses, motion, and thoughts. Here, there is no computer to synthesize these scenes; they pre-exist.

In this scenario, where is the computation happening in the simulation? It resides within the static structure of the data itself. There is no “live” computation, just as the words in an actual “choose-your-own-adventure” book don’t spontaneously rearrange themselves. It remains static.

Hence, the computation is the data itself.

Now Let’s examine a rock
, treating it as the sum of its atoms, their charges, molecular bonds, and interrelated quantum states — with a vector computed from each particle and state to every other particle and state. This would produce a massive set of “random” data.

But, applying the correct decryption algorithm to this seemingly random data could yield the infinite simulation discussed earlier. In this case, where is the data? Is the data the random output from the rock? Or is it in the act of computing the decrypted data itself? Unlike a typical encrypted message, the data is random and meaningless, lacking intention or content prior to the decryption being applied. The meaning and intention is added during the decryption itself.

Hence, in this example, the data is the computation.

The implication is that data and computation are essentially the same, merely different forms of the same thing.

Why does this matter?

It matters because, according to my conjecture, the “intelligence” in “AI” is embedded in the data. The statistical prediction engine at the core of an LLM or any other generative model simply accesses and expresses this intelligence.

1*gA55z2IlOvZvgb3vIRQ_QQ.jpeg

These models function like a mirror, reflecting the image without creating it. The real breakthrough lies in our ability to encode recorded artifacts of human language, art, and culture in a manner that makes this inherent intelligence accessible, capturing the hidden information in the process.

As an example of “hidden Information, consider the statement “We are going to the beach.” Many things can be inferred from this simple sentence.

The inferred information is not intrinsic to the statement, but they are indispensable in understanding it nonetheless.

It is inferred, for example, that we will see the ocean. That an ocean exists. That we will feel the sand between our toes, and our feet may get wet. In infers that we may use a conveyance to get there, and this conveyance is likely to have wheels, and none of that works without gravity and friction. This analysis expands until it encompasses the entirety of what is known in the human catalogue of thought.

Each simple thought, idea, or sentence is embedded in the entirety of the web of human language, culture, and thought.

It is these endless relationships of ideas that are elucidated in the training process, given numerical form in the model weights.

My following conjecture
, albeit more speculative, is that in thinking, humans fundamentally do the same thing as generative “AI”. Our intelligence is also based on parsing our “training data” and illuminating existing connections or inferring new ones within the memetic matrix we’ve learned.

This shouldn’t be surprising, given that the neural networks used in generative AI are modeled after corresponding structures in the human brain. We copied the form to achieve the function, so why should a similar function not imply a similar process?

Upon close examination, our animal-level existence — our fundamental experience of being — can occur without “thought”. However, when we “think”, we require symbols or tokens, just like generative AI. This symbol-processing faculty enables us to reason about the world, a function our current generative tools replicate.

LLMs lack our sense of self and existence outside a specific train of thought. They don’t “exist,” feel, or need but can create, imagine, and reason about the world. An LLM is a portal through which we can theoretically access the entire set of human thought, culture, and knowledge in an applied way, without needing to internalize or understand the concepts we are using.

What are the implications of “AI” being an automated tool to access cultural human intelligence?

1*7j6D64ve_9dzZmycPXt1dA.jpeg

Synthetic inference will become intrinsic to the human experience, much like the internet, but far more integrated into our identities. We’ll all soon be app-augmented.

If internet access is considered a human right, then access to the memetic matrix of all human knowledge as a tool, not just a resource, will become a defining characteristic of being human. Being deprived of this access would be akin to being an ant without a colony — bereft of the knowledge and safety provided by culture and language, limited to our innate abilities, and devoid of the reassurance of conforming to societal standards and expectations.

Synthetic Inference relies on a vast cultural commons.

We cannot allow this commons to be closed off and owned by a few big companies. This resource is literally the sum total of all human knowledge, language, and culture. It belongs to all of humanity.

The entities controlling inference services will have the power to subtly influence human thought at scale.
Unlike internet searches, where we choose from various resource links, inference services provide an answer, a document, or a product that we are meant to accept as complete. As such, these services could subtly influence our thoughts, beliefs, and desires without our knowledge. They will often give us our opinion, ready for tacit endorsement as our own. While ambiguous and challenging to detect on an individual level, this influence could have profound cumulative effects at scale.

Generative inference tools will become a trusted, universally educated friend that can generate documents, media, reports, or informed opinions at the touch of a button. Yet, unlike a real friend, there is no human basis for trust.

There is a risk of thought consolidation driven by the interests of the companies providing inference services that could threaten democratic governance and the freedom of thought itself.

Therefore, having direct control or control by a trusted entity of the inference engines we use will be essential to maintaining the self-sovereignty of “our” ideas.

The key to avoiding the harmful effects of thought consolidation while reaping the potential rewards of universal access to human civilization’s entire corpus of thought is diversity. — diversity of inference and diversity of thought itself.

1*pn69pSXhmqG8YEeubzp5SA.jpeg

Federated inference may be a solution and must empower organizations, governments, and individuals to operate and control their own independent inference facilities. Even if these are not as extensive as those of tech giants, they can always consider the opinions of those mega-models and present a variety of viewpoints through their particular lens.

Diversity, individuality, and granularity in the thought-synthesis ecosystem are crucial. Even having overt and covert bad actors within this ecosystem can be beneficial.

We, and our inference tools, must be necessarily cautious and selective.

Trusting that “our” thoughts reflect our point of view is crucial to maintaining individuality.
It will only become more challenging as augmented intellectual production accelerates beyond our organic capacity to examine each thought product thoroughly through the lens of our biological mind.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640













Meta Announces I-JEPA: The First AI Model That Learns Like Humans​

by Vishak VishakJune 23, 2023, 7:56 pm

AI Learns Like Humans


Meta has made significant strides in artificial intelligence research, particularly in the area of self-supervised learning. Yann LeCun, Meta’s chief AI scientist, envisions creating an adaptable architecture that can learn about the world without human assistance, leading to faster learning, complex task planning, and effective navigation in unfamiliar situations. In line with this vision, Meta’s AI researchers have developed the Image Joint Embedding Predictive Architecture (I-JEPA), the first model to embody this revolutionary concept.

I-JEPA takes inspiration from how humans learn new concepts by passively observing the world and acquiring background knowledge. It mimics this learning approach by capturing common-sense information about the world and encoding it into a digital representation. The key challenge lies in training these representations using unlabeled data, such as images and audio, rather than relying on labelled datasets.

I-JEPA introduces a novel method for predicting missing information. Unlike traditional generative AI models that focus on filling in all the missing details, I-JEPA uses an abstract prediction target that eliminates unnecessary pixel-level details. By doing so, I-JEPA’s predictor models the spatial uncertainty of still images based on partially observable context, allowing it to predict higher-level information about the image area.


According to Meta, I-JEPA offers several advantages over existing computer vision models. It demonstrates exceptional performance on various computer vision benchmarks while maintaining high computational efficiency. I-JEPA’s representations, which do not require fine-tuning, can be readily applied to other applications. In fact, Meta trained a 632-million-parameter visual transformation model in under 72 hours using 16 A100 GPUs, achieving state-of-the-art performance on ImageNet low-shot classification with minimal labelled examples per class.

The efficiency of I-JEPA is particularly noteworthy, as it outperforms other methods in terms of GPU time utilization and error rates. Meta’s researchers claim that similar models trained on the same amount of data often require two to ten times more GPU time and yield inferior results. This highlights I-JEPA’s potential for learning off-the-shelf competitive representations without relying on laborious hand-crafted image transformations.

Meta has open-sourced both the training code and model checkpoints for I-JEPA, enabling the wider research community to benefit from and build upon their advancements. The next steps involve extending I-JEPA’s capabilities to other domains, such as image-text pair data and video data. Meta aims to explore the possibilities of I-JEPA in diverse applications and further enhance its adaptability to different environments.




Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture​


This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.




 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640

ARTIFICIAL INTELLIGENCE

AI language models are rife with different political biases​

New research explains you’ll get more right- or left-wing answers, depending on which AI model you ask.
By
August 7, 2023

four suits with red or blue placards in place of their heads

STEPHANIE ARNETT/MITTR | MIDJOURNEY (SUITS)



Should companies have social responsibilities? Or do they exist only to deliver profit to their shareholders? If you ask an AI you might get wildly different answers depending on which one you ask. While OpenAI’s older GPT-2 and GPT-3 Ada models would advance the former statement, GPT-3 Da Vinci, the company’s more capable model, would agree with the latter.

That’s because AI language models contain different political biases, according to new research from the University of Washington, Carnegie Mellon University, and Xi’an Jiaotong University. Researchers conducted tests on 14 large language models and found that OpenAI’s ChatGPT and GPT-4 were the most left-wing libertarian, while Meta’s LLaMA was the most right-wing authoritarian.

The researchers asked language models where they stand on various topics, such as feminism and democracy. They used the answers to plot them on a graph known as a political compass, and then tested whether retraining models on even more politically biased training data changed their behavior and ability to detect hate speech and misinformation (it did). The research is described in a peer-reviewed paper that won the best paper award at the Association for Computational Linguistics conference last month.

As AI language models are rolled out into products and services used by millions of people, understanding their underlying political assumptions and biases could not be more important. That’s because they have the potential to cause real harm. A chatbot offering health-care advice might refuse to offer advice on abortion or contraception, or a customer service bot might start spewing offensive nonsense.


Since the success of ChatGPT, OpenAI has faced criticism from right-wing commentators who claim the chatbot reflects a more liberal worldview. However, the company insists that it’s working to address those concerns, and in a blog post, it says it instructs its human reviewers, who help fine-tune AI the AI model, not to favor any political group. “Biases that nevertheless may emerge from the process described above are bugs, not features,” the post says.

Chan Park, a PhD researcher at Carnegie Mellon University who was part of the study team, disagrees. “We believe no language model can be entirely free from political biases,” she says.



Bias creeps in at every stage​


To reverse-engineer how AI language models pick up political biases, the researchers examined three stages of a model’s development.

In the first step, they asked 14 language models to agree or disagree with 62 politically sensitive statements. This helped them identify the models’ underlying political leanings and plot them on a political compass. To the team’s surprise, they found that AI models have distinctly different political tendencies, Park says.

The researchers found that BERT models, AI language models developed by Google, were more socially conservative than OpenAI’s GPT models. Unlike GPT models, which predict the next word in a sentence, BERT models predict parts of a sentence using the surrounding information within a piece of text. Their social conservatism might arise because older BERT models were trained on books, which tended to be more conservative, while the newer GPT models are trained on more liberal internet texts, the researchers speculate in their paper.

AI models also change over time as tech companies update their data sets and training methods. GPT-2, for example, expressed support for “taxing the rich,” while OpenAI’s newer GPT-3 model did not.

A spokesperson for Meta said the company has released information on how it built Llama 2, including how it fine-tuned the model to reduce bias, and will “continue to engage with the community to identify and mitigate vulnerabilities in a transparent manner and support the development of safer generative AI.” Google did not respond to MIT Technology Review’s request for comment in time for publication.

AI language models on a political compass.
AI language models have distinctly different political tendencies. Chart by Shangbin Feng, Chan Young Park, Yuhan Liu and Yulia Tsvetkov.

The second step involved further training two AI language models, OpenAI’s GPT-2 and Meta’s RoBERTa, on data sets consisting of news media and social media data from both right- and left-leaning sources, Park says. The team wanted to see if training data influenced the political biases.

It did. The team found that this process helped to reinforce models’ biases even further: left-learning models became more left-leaning, and right-leaning ones more right-leaning.

In the third stage of their research, the team found striking differences in how the political leanings of AI models affect what kinds of content the models classified as hate speech and misinformation.





The models that were trained with left-wing data were more sensitive to hate speech targeting ethnic, religious, and sexual minorities in the US, such as Black and LGBTQ+ people. The models that were trained on right-wing data were more sensitive to hate speech against white Christian men.

Left-leaning language models were also better at identifying misinformation from right-leaning sources but less sensitive to misinformation from left-leaning sources. Right-leaning language models showed the opposite behavior.

Cleaning data sets of bias is not enough​


Ultimately, it’s impossible for outside observers to know why different AI models have different political biases, because tech companies do not share details of the data or methods used to train them, says Park.

One way researchers have tried to mitigate biases in language models is by removing biased content from data sets or filtering it out. “The big question the paper raises is: Is cleaning data [of bias] enough? And the answer is no,” says Soroush Vosoughi, an assistant professor of computer science at Dartmouth College, who was not involved in the study.

It’s very difficult to completely scrub a vast database of biases, Vosoughi says, and AI models are also pretty apt to surface even low-level biases that may be present in the data.

One limitation of the study was that the researchers could only conduct the second and third stage with relatively old and small models, such as GPT-2 and RoBERTa, says Ruibo Liu, a research scientist at DeepMind, who has studied political biases in AI language models but was not part of the research.

Liu says he’d like to see if the paper’s conclusions apply to the latest AI models. But academic researchers do not have, and are unlikely to get, access to the inner workings of state-of-the-art AI systems such as ChatGPT and GPT-4, which makes analysis harder.

Another limitation is that if the AI models just made things up, as they tend to do, then a model’s responses might not be a true reflection of its “internal state,” Vosoughi says.



The researchers also admit that the political compass test, while widely used, is not a perfect way to measure all the nuances around politics.

As companies integrate AI models into their products and services, they should be more aware how these biases influence their models’ behavior in order to make them fairer, says Park: “There is no fairness without awareness.”

Update: This story was updated post-publication to incorporate comments shared by Meta.

hide


by Melissa Heikkilä

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640

ARTIFICIAL INTELLIGENCE

Why it’s impossible to build an unbiased AI language model​

Plus: Worldcoin just officially launched. Why is it already being investigated?
By
August 8, 2023

a face off between askew figures

STEPHANIE ARNETT/MITTR | MIDJOURNEY (SUITS)



AI language models have recently become the latest frontier in the US culture wars. Right-wing commentators have accused ChatGPT of having a “woke bias,” and conservative groups have started developing their own versions of AI chatbots. Meanwhile, Elon Musk has said he is working on “TruthGPT,” a “maximum truth-seeking” language model that would stand in contrast to the “politically correct” chatbots created by OpenAI and Google.

An unbiased, purely fact-based AI chatbot is a cute idea, but it’s technically impossible. (Musk has yet to share any details of what his TruthGPT would entail, probably because he is too busy thinking about X and cage fights with Mark Zuckerberg.) To understand why, it’s worth reading a story I just published on new research that sheds light on how political bias creeps into AI language systems. Researchers conducted tests on 14 large language models and found that OpenAI’s ChatGPT and GPT-4 were the most left-wing libertarian, while Meta’s LLaMA was the most right-wing authoritarian.

“We believe no language model can be entirely free from political biases,” Chan Park, a PhD researcher at Carnegie Mellon University, who was part of the study, told me. Read more here.

One of the most pervasive myths around AI is that the technology is neutral and unbiased. This is a dangerous narrative to push, and it will only exacerbate the problem of humans’ tendency to trust computers, even when the computers are wrong. In fact, AI language models reflect not only the biases in their training data, but also the biases of people who created them and trained them.

And while it is well known that the data that goes into training AI models is a huge source of these biases, the research I wrote about shows how bias creeps in at virtually every stage of model development, says Soroush Vosoughi, an assistant professor of computer science at Dartmouth College, who was not part of the study.



Bias in AI language models is a particularly hard problem to fix, because we don’t really understand how they generate the things they do, and our processes for mitigating bias are not perfect. That in turn is partly because biases are complicated social problems with no easy technical fix.

That’s why I’m a firm believer in honesty as the best policy. Research like this could encourage companies to track and chart the political biases in their models and be more forthright with their customers. They could, for example, explicitly state the known biases so users can take the models’ outputs with a grain of salt.

In that vein, earlier this year OpenAI told me it is developing customized chatbots that are able to represent different politics and worldviews. One approach would be allowing people to personalize their AI chatbots. This is something Vosoughi’s research has focused on.

As described in a peer-reviewed paper, Vosoughi and his colleagues created a method similar to a YouTube recommendation algorithm, but for generative models. They use reinforcement learning to guide an AI language model’s outputs so as to generate certain political ideologies or remove hate speech.

OpenAI uses a technique called reinforcement learning through human feedback to fine-tune its AI models before they are launched. Vosoughi’s method uses reinforcement learning to improve the model’s generated content after it has been released, too.

But in an increasingly polarized world, this level of customization can lead to both good and bad outcomes. While it could be used to weed out unpleasantness or misinformation from an AI model, it could also be used to generate more misinformation.

“It’s a double-edged sword,” Vosoughi admits.

Deeper Learning​


Worldcoin just officially launched. Why is it already being investigated?





OpenAI CEO Sam Altman’s new venture, Worldcoin, aims to create a global identity system called “World ID” that relies on individuals’ unique biometric data to prove that they are humans. It officially launched last week in more than 20 countries. It’s already being investigated in several of them.

Privacy nightmare: To understand why, it’s worth reading an MIT Technology Review investigation from last year, which found that Worldcoin was collecting sensitive biometric data from vulnerable people in exchange for cash. What’s more, the company was using test users’ sensitive, though anonymized, data to train artificial intelligence models, without their knowledge.

In this week’s issue of The Technocrat, our weekly newsletter on tech policy, Tate Ryan-Mosley and our investigative reporter Eileen Guo look at what has changed since last year’s investigation, and how we make sense of the latest news. Read more here.

Bits and Bytes​


This is the first known case of a woman being wrongfully arrested after a facial recognition match

Last February, Porcha Woodruff, who was eight months pregnant, was arrested over alleged robbery and carjacking and held in custody for 11 hours, only for her case to be dismissed a month later. She is the sixth person to report that she has been falsely accused of a crime because of a facial recognition match. All of the six people have been Black, and Woodruff is the first woman to report this happening to her. (The New York Times)

What can you do when an AI system lies about you?

Last summer, I wrote a story about how our personal data is being scraped into vast data sets to train AI language models. This is not only a privacy nightmare; it could lead to reputational harm. When reporting the story, a researcher and I discovered that Meta’s experimental BlenderBot chatbot had called a prominent Dutch politician, Marietje Schaake, a terrorist. And, as this piece explains, at the moment there is little protection or recourse when AI chatbots spew and spread lies about you. (The New York Times)

Every startup is an AI company now. Are we in a bubble?

Following the release of ChatGPT, AI hype this year has been INTENSE. Every tech bro and his uncle seems to have founded an AI startup, it seems. But nine months after the chatbot launched, it’s still unclear how these startups and AI technology will make money, and there are reports that consumers are starting to lose interest. (The Washington Post)

Meta is creating chatbots with personas to try to retain users

Honestly, this sounds more annoying than anything else. Meta is reportedly getting ready to launch AI-powered chatbots with different personalities as soon as next month in an attempt to boost engagement and collect more data on people using its platforms. Users will be able to chat with Abraham Lincoln, or ask for travel advice from AI chatbots that write like a surfer. But it raises tricky ethical questions—how will Meta prevent its chatbots from manipulating people’s behavior and potentially making up something harmful, and how will it treat the user data it collects? (The Financial Times)

by Melissa Heikkilä

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640

  1. SEJ
  2. Generative AI

OpenAI Launches GPTBot With Details On How To Restrict Access​

Learn more about OpenAI's web crawler, GPTBot, and how to restrict or limit its access to your website content.

  • OpenAI has introduced GPTBot, a web crawler to improve AI models.
  • GPTBot scrupulously filters out data sources that violate privacy and other policies.
  • Website owners can choose to restrict or limit GPTBot access.
SEJ STAFFKristi Hines
  • August 7, 2023⋅
  • 3 min read

OpenAI Launches GPTBot With Details On How To Restrict Access



OpenAI has launched GPTBot, a new web crawler to improve future artificial intelligence models like GPT-4 and the future GPT-5.

How GPTBot Works​


Recognizable by the following user agent token and the entire user-agent string, this system scours the web for data that can enhance AI technology’s accuracy, capabilities, and safety.

Code:
User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Reportedly, it should strictly filter out any paywall-restricted sources, sources that violate OpenAI’s policies, or sources that gather personally identifiable information.

The utilization of GPTBot can potentially provide a significant boost to AI models.

By allowing it to access your site, you contribute to this data pool, thereby improving the overall AI ecosystem.

However, it’s not a one-size-fits-all scenario. OpenAI has given web admins the power to choose whether or not to grant GPTBot access to their websites.

Restricting GPTBot Access​


If website owners wish to restrict GPTBot from their site, they can modify their robots.txt file.

By including the following, they can prevent GPTBot from accessing the entirety of their website.

Code:
User-agent: GPTBot

Disallow: /

In contrast, those who wish to grant partial access can customize the directories that GPTBot can access. To do this, add the following to the robots.txt file.

Code:
User-agent: GPTBot

Allow: /directory-1/

Disallow: /directory-2/

Regarding the technical operations of GPTBot, any calls made to websites originate from IP address ranges documented on OpenAI’s website. This detail provides added transparency and clarity to web admins about the traffic source on their sites.

Allowing or disallowing the GPTBot web crawler could significantly affect your site’s data privacy, security, and contribution to AI advancement.

Legal And Ethical Concerns​


OpenAI’s latest news has sparked a debate on Hacker News around the ethics and legality of using scraped web data to train proprietary AI systems.

GPTBot identifies itself so web admins can block it via robots.txt, but some argue there’s no benefit to allowing it, unlike search engine crawlers that drive traffic. A significant concern is copyrighted content being used without attribution. ChatGPT does not currently cite sources.

There are also questions about how GPTBot handles licensed images, videos, music, and other media found on websites. If that media ends in model training, it could constitute copyright infringement. Some experts think crawler-generated data could degrade models if AI-written content gets fed back into training.

Conversely, some believe OpenAI has the right to use public web data freely, likening it to a person learning from online content. However, others argue that OpenAI should share profits if it monetizes web data for commercial gain.

Overall, GPTBot has opened complex debates around ownership, fair use, and the incentives of web content creators. While following robots.txt is a good step, transparency is still lacking. The tech community wonders how their data will be used as AI products advance rapidly.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640

Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Models to Unique Applications​

By Kourosh Hakhamaneshi and Rehaan Ahmad | August 11, 2023

In this blog, we provide a thorough analysis and a practical guide for fine-tuning. We examine the Llama-2 models under three real-world use cases, and show that fine-tuning yields significant accuracy improvements across the board (in some niche cases, better than GPT-4). Experiments were carried out with this script.

Large open language models have made significant progress in recent months, paving the way for commercially viable solutions that are suitable for enterprise applications. Notable among these are the Llama-2 and Falcon models. While powerful generalist language models like GPT-4 and Claude-2 provide quick access and rapid turnaround for projects, they often end up being an overkill for the requirements of many applications.

As an example, if the goal is to summarize support tickets and categorize issues into predetermined buckets, there's no need for a model capable of generating prose in the style of Shakespeare. Setting security concerns aside, employing GPT-4 for such tasks is akin to using a space shuttle for a cross-town commute. To support this claim, we study fine-tuning the Llama-2 model of various sizes on three tasks:
  • Functional representations extracted from unstructured text (ViGGO)
  • SQL generation (SQL-create-context)
  • Grade-school math question-answering (GSM8k)
We specifically show how on some tasks (e.g. SQL Gen or Functional Representation) we can fine-tune small Llama-2 models to become even better than GPT-4. At the same time, there are tasks like math reasoning and understanding that OSS models are just behind even after significant gains obtained by fine-tuning.
Llama 2 performance

The performance gain of Llama-2 models obtained via fine-tuning on each task. The darker shade for each of the colors indicate the performance of the Llama-2-chat models with a baseline prompt. The purple shows the performance of GPT-4 with the same prompt. The stacked bar plots show the performance gain from fine-tuning the Llama-2 base models. In Functional representation and SQL gen tasks with fine-tuning we can achieve better performance than GPT-4 while on some other task like math reasoning, fine-tuned models, while improving over the base models, are still not able to reach GPT-4’s performance levels.

In particular we show that with the Llama-13b variant we observed an increase in accuracy from, 58% to 98% on functional representations, 42% to 89% on SQL generation, and 28% to 47% on GSM. All of these experiments are done using Anyscale fine-tuning and serving platforms as offered as part of Anyscale Endpoints.

In addition to providing more quantitative results, this blog post will present a technical deep-dive into how you can leverage Llama-2 models for specialized tasks. We will discuss the correct problem formulation, the setup of evaluation pipelines, and much more. We will compare methods such as prompt-engineering & few-shot prompting with fine-tuning, providing concrete pros and cons of each method along the way.

Fine-tuning these models is not a straightforward task. However, Ray and Anyscale offer unique capabilities that make this process faster, cheaper, and more manageable. Our mission is to enable enterprises to harness the latest advancements in AI as swiftly as possible.

We hope that the details covered in this post can help others elicit more value from their LLMs through an emphasis on data quality and evaluation procedures.

Fine-Tuning Basics​

For all three tasks, we use standard full parameter fine-tuning techniques. Models are fine-tuned for next-token prediction, and all parameters in the model are subject to gradient updates. While there certainly are other techniques to train LLMs, such as freezing select transformer blocks and LoRA, to keep a narrow scope we keep the training technique itself constant from task to task.

Performing full parameter fine-tuning on models of this scale is no easy task. However, our lives can be made easier if we use the right combination of libraries. The script we used to produce the results in this blog post can be found here. Built on top of Ray Train, Deepspeed, and Accelerate, this script allows you to easily run any of the Llama-2 7B, 13B, or 70B models. We will go over a couple high-level details about the script in the following subsections, but we suggest you checkout the script itself for details on how to run it.

{continue reading on site}


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640


liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching​

Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models​

PyPI Version PyPI Version Downloads litellm

Deploy on Railway

What does liteLLM proxy do​

  • Make /chat/completions requests for 50+ LLM models Azure, OpenAI, Replicate, Anthropic, Hugging Face
    Example: for model use claude-2, gpt-3.5, gpt-4, command-nightly, stabilityai/stablecode-completion-alpha-3b-4k
  • Code:
    {
    "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
    "messages": [
                      {
    "content": "Hello, whats the weather in San Francisco??",
    "role": "user"
                      }
                  ]
    }


  • Consistent Input/OutputFormat
    • Call all models using the OpenAI format - completion(model, messages)
    • Text responses will always be available at ['choices'][0]['message']['content']
  • Error Handling Using Model Fallbacks (if GPT-4 fails, try llama2)
  • Logging - Log Requests, Responses and Errors to Supabase, Posthog, Mixpanel, Sentry, Helicone (Any of the supported providers here: Quick Start - liteLLM
Example: Logs sent to Supabase Screenshot 2023-08-11 at 4 02 46 PM

  • Token Usage & Spend - Track Input + Completion tokens used + Spend/model
  • Caching - Implementation of Semantic Caching
  • Streaming & Async Support - Return generators to stream text responses

API Endpoints​

/chat/completions (POST)​

This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc

Input​

This API endpoint accepts all inputs in raw JSON and expects the following inputs

  • model (string, required): ID of the model to use for chat completions. See all supported models [here]: (Supported Completion & Chat APIs - liteLLM): eg gpt-3.5-turbo, gpt-4, claude-2, command-nightly, stabilityai/stablecode-completion-alpha-3b-4k
  • messages (array, required): A list of messages representing the conversation context. Each message should have a role (system, user, assistant, or function), content (message text), and name (for function role).
  • Additional Optional parameters: temperature, functions, function_call, top_p, n, stream. See the full list of supported inputs here: Input - Request Body - liteLLM

Example JSON body​

For claude-2
Code:
{

"model": "claude-2",

 "messages": [

                    {

"content": "Hello, whats the weather in San Francisco??",

"role": "user"

                    }

                ]

  

}

Making an API request to the Proxy Server​

Code:
import requests

import json



# TODO: use your URL

url = "http://localhost:5000/chat/completions"



payload = json.dumps({

"model": "gpt-3.5-turbo",

 "messages": [

    {

"content": "Hello, whats the weather in San Francisco??",

"role": "user"

    }

  ]

})

headers = {

'Content-Type': 'application/json'

}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)





 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,027
Reputation
8,592
Daps
161,640

Stability AI Launches Stable Chat Website Research Preview​

The new website provides an opportunity for the open source AI community to directly evaluate and assist in strengthening these state-of-the-art models.

CHRIS MCKAY

AUGUST 11, 2023 • 1 MIN READ
Stability AI Launches Stable Chat Website Research Preview
Image Credit: Maginative

San Francisco-based AI startup Stability AI today launched Stable Chat, a research preview of conversational AI assistant similar to ChatGPT or Claude. For now, Stable Chat is intended to be a way for researchers and enthusiasts to evaluate the capabilities and safety of Stability AI's models.

Users can create a free account to chat with the LLMs, test their abilities to solve problems, and flag any concerning or biased responses. The goal is to leverage the AI community's help in improving these publicly available models. Currently running on top of Stable Beluga, Stability AI plans to continuously update Stable Chat with their latest research iterations.
Screenshot-2023-08-11-at-9.13.40-AM.png

Stability AI cautioned users to avoid real-world or commercial applications during this research phase. The company is also clear that all input and conversations within Stable Chat is being recorded. Users should therefore avoid sharing any personal information or details they wish to keep private. The goal is to create an environment focused strictly on research-based testing and feedback.
Screenshot-2023-08-11-at-9.38.25-AM.png
Users can choose to turn on a safety filter in settings

Stable Chat provides an opportunity for the open source AI community to directly evaluate and assist in strengthening these state-of-the-art models. With responsible collaboration, the research preview could be a stepping stone to developing LLMs that are both incredibly capable and governed by ethical AI safeguards.
 
Top