4 Charts That Show Why AI Progress Is Unlikely to Slow Down
Three major inputs are the key to understanding how AI will likely progress
time.com
4 Charts That Show Why AI Progress Is Unlikely to Slow Down
Illustration by Katie Kalupson for TIME
BY WILL HENSHALL
AUGUST 2, 2023 4:50 PM EDT
In the last ten years, AI systems have developed at rapid speed. From the breakthrough of besting a legendary player at the complex game Go in 2016, AI is now able to recognize images and speech better than humans, and pass tests including business school exams and Amazon coding interview questions.
Last week, during a U.S. Senate Judiciary Committee hearing about regulating AI, Senator Richard Blumenthal of Connecticut described the reaction of his constituents to recent advances in AI. “The word that has been used repeatedly is scary.”
The Subcommittee on Privacy, Technology, and the Law overseeing the meeting heard testimonies from three expert witnesses, who stressed the pace of progress in AI. One of those witnesses, Dario Amodei, CEO of prominent AI company Anthropic, said that “the single most important thing to understand about AI is how fast it is moving.”
It’s often thought that scientific and technological progress is fundamentally unpredictable, and is driven by flashes of insight that are clearer in hindsight. But progress in the capabilities of AI systems is predictably driven by progress in three inputs—compute, data, and algorithms. Much of the progress of the last 70 years has been a result of researchers training their AI systems using greater computational processing power, often referred to as “compute”, feeding the systems more data, or coming up with algorithmic hacks that effectively decrease the amount of compute or data needed to get the same results. Understanding how these three factors have driven AI progress in the past is key to understanding why most people working in AI don’t expect progress to slow down any time soon.
Read more: The AI Arms Race Is Changing Everything
Compute
The first artificial neural network, Perceptron Mark I, was developed in 1957 and could learn to tell whether a card was marked on the left side or the right. It had 1,000 artificial neurons, and training it required around 700,000 operations. More than 70 years later, OpenAI released the large language model GPT-4. Training GPT-4 required an estimated 21 septillion operations.Increasing computation allows AI systems to ingest greater amounts of data, meaning the system has more examples to learn from. More computation also allows the system to model the relationship between the variables in the data in greater detail, meaning it can draw more accurate and nuanced conclusions from the examples it is shown.
Since 1965, Moore’s law—the observation that the number of transistors in an integrated circuit doubles about every two years—has meant the price of compute has been steadily decreasing. While this did mean that the amount of compute used to train AI systems increased, researchers were more focused on developing new techniques for building AI systems rather than focusing on how much compute was used to train those systems, according to Jaime Sevilla, director of Epoch, a research organization.
This changed around 2010, says Sevilla. “People realized that if you were to train bigger models, you will actually not get diminishing returns,” which was the commonly held view at the time.
Since then, developers have been spending increasingly large amounts of money to train larger scale models. Training AI systems requires expensive specialized chips. AI developers either build their own computing infrastructure, or pay cloud computing providers for access to theirs. Sam Altman, CEO of OpenAI, has said that GPT-4 cost over $100 million to train. This increased spending, combined with the continued decreases in the cost of the increases in compute resulting from Moore’s Law, has led to AI models being trained on huge amounts of compute.
OpenAI and Anthropic, two of the leading AI companies, have each raised billions from investors to pay for the compute they use to train AI systems, and each has partnerships with tech giants that have deep pockets—OpenAI with Microsoft and Anthropic with Google.
Data
AI systems work by building models of the relationships between variables in their training data—whether it’s how likely the word “home” is to appear next to the word “run,” or patterns in how gene sequence relates to protein folding, the process by which a protein takes its 3D form, which then defines its function.In general, a larger number of data points means that AI systems have more information with which to build an accurate model of the relationship between the variables in the data, which improves performance. For example, a language model that is fed more text will have a greater number of examples of sentences in which the “run” follows “home”—in sentences that describe baseball games or emphatic success, this sequence of words is more likely.
The original research paper about Perceptron Mark I says that it was trained on just six data points. By comparison, LlaMa, a large language model developed by researchers at Meta and released in 2023, was trained on around one billion data points—a more than 160-million fold increase from Perceptron Mark 1. In the case of LlaMa, the data points was text collected from a range of sources, including 67% from Common Crawl data (Common Crawl is a non-profit that scrapes the internet and makes the data collected freely available), 4.5% from GitHub (an internet service used by software developers), and 4.5% from Wikipedia.