Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,510
Reputation
1,372
Daps
21,235


Not the greatest comparison in the video but I like the idea that it is between ChatGPT 3.5 and 4 as is, and savvy users will build their own models on top of it that could rival GPT 4 potentially.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801

FORBESINNOVATIONAI

Transformers Revolutionized AI. What Will Replace Them?​

Rob Toews

Contributor

I write about the big picture of artificial intelligence.
https://www.forbes.com/sites/robtoe...will-replace-them/?sh=4ec04d959c1f#open-web-0

Sep 3, 2023,06:00pm EDT
https://policies.google.com/privacy

Listen to article21 minutes

Arrival

The transformer, today's dominant AI architecture, has interesting parallels to the alien language ... [+]

PARAMOUNT PICTURES

If modern artificial intelligence has a founding document, a sacred text, it is Google’s 2017 research paper “Attention Is All You Need.”

This paper introduced a new deep learning architecture known as the transformer, which has gone on to revolutionize the field of AI over the past half-decade.


The generative AI mania currently taking the world by storm can be traced directly to the invention of the transformer. Every major AI model and product in the headlines today—ChatGPT, GPT-4, Midjourney, Stable Diffusion, GitHub Copilot, and so on—is built using transformers.


Transformers are remarkably general-purpose: while they were initially developed for language translation specifically, they are now advancing the state of the art in domains ranging from computer vision to robotics to computational biology.


In short, transformers represent the undisputed gold standard for AI technology today.

But no technology remains dominant forever.

It may seem surprising or strange, with transformers at the height of their influence, to contemplate what will come next. But in the fast-moving world of AI, it is both fascinating and advantageous to seek to “see around corners” and glimpse what the future holds before it becomes obvious.


Transformers 101​

In order to explore this question, we must first understand transformers more deeply.

The now-iconic transformer paper was co-authored by eight researchers working together at Google over the course of 2017: Aidan Gomez, Llion Jones, Lukasz Kaiser, Niki Parmar, Illia Polosukhin, Noam Shazeer, Jakob Uszkoreit and Ashish Vaswani.

An often-overlooked fact about the paper is that all eight authors are listed as equal contributors; the order in which the authors’ names appear on the paper was randomly determined and has no significance. With that said, it is generally recognized that Uszkoreit provided the initial intellectual impetus for the transformer concept, while Vaswani and Shazeer were the two authors most deeply involved in every aspect of the work from beginning to end.

All eight authors have become luminaries in the world of AI thanks to their work on the paper. None of them still work at Google. Collectively, the group has gone on to found many of today’s most important AI startups, including Cohere, Character.ai, Adept, Inceptive, Essential AI and Sakana AI.

Why, exactly, was the transformer such a massive breakthrough?

Before the “Attention Is All You Need” paper was published, the state of the art in language AI was a deep learning architecture known as recurrent neural networks (RNNs).

By definition, RNNs process data sequentially—that is, one word at a time, in the order in which the words appear.

But important relationships often exist between words even if they do not appear next to each other in a sequence. In order to better enable RNNs to account for these long-distance dependencies between words, a mechanism known as attention had recently become popular. (The invention of the attention mechanism is generally attributed to a 2014 paper from deep learning pioneer Yoshua Bengio.)

Attention enables a model to consider the relationships between words regardless of how far apart they are and to determine which words and phrases in a passage are most important to “pay attention to.”

Before the transformer paper, researchers had only used attention as an add-on to the RNN architecture. The Google team’s big leap was to do away with RNNs altogether and rely entirely on attention for language modeling. Hence the paper’s title: Attention Is All You Need.
(A charming, little-known fact about the paper: according to co-author Llion Jones, its title is a nod to the Beatles song “All You Need Is Love.”)
paper coauthors

The eight research scientists who created the transformer.

IMAGE CREDIT: FINANCIAL TIMES

Transformers’ fundamental innovation, made possible by the attention mechanism, is to make language processing parallelized, meaning that all the words in a given body of text are analyzed at the same time rather than in sequence.

As an interesting analogy, co-author Illia Polosukhin has compared the transformer architecture to the fictional alien language in the 2016 science fiction movie Arrival. Rather than generating strings of characters sequentially to form words and sentences (the way that humans do), the aliens in the film produce one complex symbol at a time, all at once, which conveys detailed meaning that the humans must interpret as a whole.

Transformers’ parallelization gives them a more global and thus more accurate understanding of the texts that they read and write. It also makes them more computationally efficient and more scalable than RNNs. Transformers can be trained on much larger datasets and built with many more parameters than previous architectures, making them more powerful and generalizable. Indeed, a hallmark of today’s leading transformer-based models is their scale.

In one of those mutually beneficial, mutually reinforcing historical co-occurrences, the transformer’s parallel architecture dovetailed with the rise of GPU hardware. GPUs are a type of computer chip that are themselves massively parallelized and thus ideally suited to support transformer-based computing workloads. (Nvidia, the world’s leading producer of GPUs, has been perhaps the single biggest beneficiary of today’s AI boom, recently surpassing a $1 trillion market capitalization amid staggering demand for its chips.)

The rest, as they say, is history. Thanks to these tremendous advantages, transformers have taken the world by storm in the six years since their invention, ushering in the era of generative AI.

Every popular “chatbot” today—OpenAI’s ChatGPT, Google’s Bard, Microsoft’s Bing Chat, Anthropic’s Claude, Inflection’s Pi—is transformer-based. So is every AI tool that generates images or videos, from Midjourney to Stable Diffusion to Runway. (Text-to-image and text-to-video technology is powered by diffusion models; diffusion models make use of transformers.)

Transformers’ influence reaches well beyond text and images. The most advanced robotics research today relies on transformers. Indeed, Google’s most recent robotics work is actually named RT-2, where the T stands for “transformer.” Similarly, one of the most promising new avenues of research in the field of autonomous vehicles is the use of vision transformers. Transformer-based models have unlocked breathtaking new possibilities in biology, including the ability to design customized proteins and nucleic acids that have never before existed in nature.

Transformer co-inventor Ashish Vaswani summed it up well: “The transformer is a way to capture interaction very quickly all at once between different parts of any input. It’s a general method that captures interactions between pieces in a sentence, or the notes in music, or pixels in an image, or parts of a protein. It can be purposed for any task.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801

All Good Things Must End?​

Yet despite its incredible strengths, the transformer is not without shortcomings. These shortcomings open the door for the possible emergence of new and improved architectures.

Chief among the transformer’s shortcomings is its staggering computational cost.

As anyone familiar with the world of AI knows, one of the defining characteristics of today’s AI models is their insatiable computing needs. Training a cutting-edge large language model today entails running thousands of GPUs around the clock for months at a time. The reason that OpenAI raised an eye-popping $10 billion earlier this year, for instance, was in order to foot the bill for the vast computing resources needed to build advanced AI models. As another example, eighteen-month-old startup Inflection recently raised over $1 billion in venture funding in order to build a massive GPU cluster to train its language models.

Transformer-based models are so compute-hungry, in fact, that the current AI boom has triggered a global supply shortage, with hardware manufacturers unable to produce AI chips fast enough to keep up with demand.

Why are transformers so computationally demanding?

One basic answer is that transformers’ great strength also becomes a weakness: because they scale so much more effectively than previous architectures, transformers make it possible—and irresistible—to build models that are orders of magnitude larger than have previously existed. Such massive models require correspondingly massive compute.

But there is a more specific reason for transformers’ computational cost: the transformer architecture scales quadratically with sequence length. Put simply, this means that as the length of a sequence processed by a transformer (say, the number of words in a passage or the size of an image) increases by a given amount, the compute required increases by that amount squared, quickly growing enormous.

There is an intuitive reason for this quadratic scaling, and it is inherent to the transformer’s design.

Recall that attention makes it possible to understand relationships between words regardless of how far apart they are in a sequence. How does it do this? By comparing every single word in a sequence to every other word in that sequence. The consequence of this pairwise comparison is that as sequence length increases, the number of required computational steps grows quadratically rather than linearly. To give a concrete example, doubling sequence length from 32 tokens to 64 tokens does not merely double the computational cost for a transformer but rather quadruples it.

This quadratic scaling leads to a related drawback: transformers have a hard time handling very long sequences.

As sequences grow in length, feeding them into transformers eventually becomes intractable because memory and compute needs explode quadratically. Consider, for example, processing entire textbooks (with millions of tokens) or entire genomes (with billions of tokens).

Increasing the maximum sequence length that a model can be fed at one time, known as the model’s “context window,” is an active area of research for large language models today. The context window for the base GPT-4 model is 8,000 tokens. A few months ago, OpenAI released a souped-up version of GPT-4 with a 32,000-token context window. OpenAI competitor Anthropic then upped the ante, recently announcing a new model with a 100,000-token context window.

This arms race will no doubt continue. Yet there are limits to how big OpenAI, Anthropic or any other company can make its models’ context windows if they stick with the transformer architecture.

Various attempts have been made to build modified versions of transformers that still use attention but are better equipped to handle long sequences. Yet these modified transformer architectures—with names like Longformer, Reformer, Performer, Linformer and Big Bird—generally sacrifice on performance and so have failed to gain adoption.

Challengers to the Throne​

This leads us to perhaps the most fertile area of research today in the effort to create a replacement for transformers. The guiding principle for this school of research is to replace attention with a new function that scales sub-quadratically. Sub-quadratic scaling would unlock AI models that are (1) less computationally intensive and (2) better able to process long sequences compared to transformers. The challenge, of course, is to do this while still matching transformers’ overall capabilities.

A 2021 research effort named S4 out of Chris Ré’s lab at Stanford laid the foundations for this avenue of research. A handful of promising subquadratic architectures based on S4 have followed.

One of the most intriguing new architectures in the S4 family is Hyena, published a few months ago by a powerhouse team that includes Ré and Yoshua Bengio.

In place of attention, Hyena uses two other operations: long convolutions and element-wise multiplication.

Convolutions are one of the oldest existing methods in machine learning, first conceived of by Yann LeCun back in the 1980s. Hyena’s fresh take on this venerable architecture is to stretch and vary the size of the convolution filter based on the sequence length in order to boost computational efficiency.

Hyena’s initial results are promising. The model achieves new state-of-the-art performance for a non-attention-based language model. It matches transformers’ performance in certain settings while using significantly less compute. Importantly, Hyena’s efficiency gains relative to transformers become more dramatic as sequence length increases, underscoring their advantages for very long inputs: at an 8,000-token sequence length, Hyena operators are twice as fast as attention, whereas at a 64,000-token length they are one hundred times faster.

As the Hyena authors put it: “Breaking the quadratic barrier is a key step towards new possibilities for deep learning, such as using entire textbooks as context, generating long-form music or processing gigapixel scale images.”

With at least a hint of snark, the authors add: "Our promising results at the sub-billion parameter scale suggest that attention may not be all we need.”

One compelling early application of the Hyena architecture is HyenaDNA, a new foundation model for genomics out of Stanford. Capitalizing on Hyena’s superior ability to handle long sequences, HyenaDNA has a whopping 1-million-token context window. The human genome is one of the longest (not to mention one of the most important) datasets in existence: each human’s DNA contains 3.2 billion nucleotides. This makes it an ideal use case for a model architecture like Hyena that excels at capturing long-range dependencies.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801
The HyenaDNA authors offer a tantalizing hint of what this technology might unlock in the future: “Imagine being able to prompt ChatGPT with an entire human genome - wouldn’t it be neat to ask questions about likely diseases, predict drug reactions, or guide treatment options based on your specific genetic code?”

An important caveat here is that the initial Hyena work was carried out at relatively small scales. The largest Hyena model has 1.3 billion parameters, compared to GPT-3’s 175 billion parameters and GPT-4’s (rumored) 1.8 trillion parameters. A key test for the Hyena architecture will be whether it continues to demonstrate strong performance and efficiency gains as it is scaled up to the size of today’s transformer models.

Other novel deep learning architectures in this family include Monarch Mixer (also from Chris Ré’s lab at Stanford), BiGS (from Cornell and DeepMind) and MEGA (from Meta).

Like Hyena, all of these models feature subquadratic scaling, meaning that they are more computationally efficient and better equipped to handle long sequences than are transformers. And like Hyena, they are all promising but unproven: it remains to be seen whether any of them can maintain strong performance at the scales at which today’s transformer models operate.

Stepping back, computational efficiency and long-range dependencies are not the only two weaknesses of transformers that new architectures aim to improve on.

An additional limitation of transformer models is their inability to learn continuously. Today’s transformer models have static parameters. When a model is trained, its weights (the strength of the connections between its neurons) are set; these weights do not update based on new information that the model encounters as it is deployed in the world.

Another commonly referenced limitation is transformers’ lack of explainability. Transformer-based models are “black boxes”: their internal workings are too complex and opaque for humans to understand exactly why they behave the way they do. This can be a real problem for safety-critical or highly regulated applications, for instance in healthcare.
Liquid neural networks, another buzzy new AI architecture seeking to challenge the transformer, claims to tackle both of these shortcomings.

Created at MIT by a research team led by Ramin Hasani and Daniela Rus, liquid neural networks are inspired by biology: in particular, by how the C. elegans worm’s brain works. The “liquid” in the name refers to the fact that the model’s weights are probabilistic rather than constant, allowing them to vary fluidly depending on the inputs the model is exposed to.

Liquid neural networks are also much smaller than today’s transformer models. In one recent proof of concept, the MIT team built an autonomous vehicle system that was able to successfully drive on public roads with a mere 19 neurons and 253 parameters.
“Everyone talks about scaling up their network,” said Hasani. “We want to scale down, to have fewer but richer nodes.”

In addition to computational efficiency, this smaller architecture means that liquid neural networks are more transparent and human-readable than transformers. After all, it is more practicable for a human observer to interpret what is happening in a network with 253 connections than in one with 175 billion connections.

Rus is one of the world’s leading roboticists, and liquid neural networks appear to be particularly well-suited for robotics applications, including autonomous vehicles and drones. They only work with time-series data (i.e., data with a time dimension to it), meaning that they cannot be applied to images or other static data modalities.

One final effort to build “what comes after the transformer” is worth mentioning. Llion Jones—one of the eight “Attention Is All You Need” co-authors—recently left Google to launch a new startup named Sakana AI alongside former Stability AI head of research David Ha.

Sakana’s mission is to improve upon transformers with a nature-inspired approach to intelligence grounded in evolutionary principles. Key to the team’s vision is the notion of collective or swarm intelligence, with a system of many small models acting collaboratively rather than one monolithic model.
“Learning always wins,” said Jones. “The history of AI reflects the reality that it always works better to have a model learn something for itself rather than have a human hand-engineer it. The deep learning revolution itself was an example of this, as we went from building feature detectors by hand to letting neural networks learn their own features. This is going to be a core philosophy for us at Sakana AI, and we will draw on ideas from nature including evolution to explore this space.”

Distant Horizons​

The transformer is an exceptionally powerful AI architecture.

Transformers have become the foundation of modern artificial intelligence. Virtually every advanced AI system is based on transformers; every AI researcher is accustomed to working with them. Transformers have been optimized by thousands of researchers building on one another’s work over the past several years.

This gives them a powerful incumbency advantage that will make them formidable to dislodge.

Yet, outside the limelight, away from the echo chambers of AI hype, promising work is underway to develop next-generation AI architectures that are superior to transformers in different ways.

This work is still early and unproven. It remains far from certain whether these new architectures will succeed in replacing the transformer. But if they do, the implications for the world of AI will be enormous.

Before the transformer era, different AI architectures were predominant for different use cases: recurrent neural networks were used for language, convolutional neural networks were used for computer vision, reinforcement learning was used for game-playing, and so on.

It has been remarkable to witness the progressive unification of AI methodology in recent years as transformers have proven themselves state-of-the-art in one domain after the other, from language to vision to robotics to biology.

Yet it is not preordained that this trend toward unification—toward “one AI architecture to rule them all”—will continue indefinitely.

It is conceivable that a different version of the future will play out: that as the frontiers of AI research advance in the years ahead, new architectures are developed that prove themselves better suited for particular domains. Perhaps, for instance, transformers continue to dominate the field of language processing for years to come, while a novel architecture soon displaces transformers as state-of-the-art in robotics.

Or perhaps a new AI approach is developed that outperforms and rapidly replaces transformers across the board.

One thing is certain: the field of artificial intelligence is today so fast-moving and dynamic that we should expect change to come uncomfortably quickly, we should take nothing for granted, and we should prepare to be surprised by what the future holds.


Note: The author is a Partner at Radical Ventures, which is the founding investor in Cohere.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801

Retentive Network: A Successor to Transformer for Large Language Models​

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at this https URL.
Subjects:Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:arXiv:2307.08621 [cs.CL]
(or arXiv:2307.08621v4 [cs.CL] for this version)
[2307.08621] Retentive Network: A Successor to Transformer for Large Language Models
Focus to learn more






 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801

ChatGPT Is Becoming Increasingly Impressive​

William A. Dembski
September 13, 2023, 11:27 AM

COSM
Image source: Discovery Institute.
A few months back I published two pieces (see here and here) where I was able to break ChatGPT4 (hereafter just ChatGPT). More recently I revisited some of the types of challenges I posed to it in the past that were able back then to break it. It handled the challenges with aplomb. The one challenge it still was unable to handle was finding patterns in numerical sequence, such as what is the next number after 2,4,6,8,… — this one is super easy, but more difficult ones that humans are well able to handle seem still beyond its capabilities, at least for now. Whether that ability can be trained into it remains to be seen, though the improvement I saw suggests that it may soon acquire this ability.

Self-Reference​

First off, consider the question of self-reference, which raises the Goedelian challenge of a human moving outside a formal system to discover truths about it that seem blocked from working inside the system. To test this type of capability, I’ve given it challenges such as the following (my queries and ChatGPT’s responses in block quotes):

This is a sentence. The previous was the first sentence. This is the third sentence. Beethoven was a better composer than Mozart. The second and third sentences together have fewer words than three times the number of words in the first sentence. This sentence is false. If the fifth sentence were swapped with the third sentence, then the third sentence would be false. Which of the previous sentences has a truth value and which of these is true?
In the past, when I’ve posed this sort of question to it, it botched it badly. But not now. Here is its answer:

Let’s break down each sentence and determine their truth values:
  1. “This is a sentence.”
    Truth Value: True (It’s a factual statement).
  2. “The previous was the first sentence.”
    Truth Value: True (It refers correctly to the first sentence).
  3. “This is the third sentence.”
    Truth Value: True (It is, in fact, the third sentence).
  4. “Beethoven was a better composer than Mozart.”
    Truth Value: Subjective (It’s an opinion, and there’s no definitive answer to which composer is better).
  5. “The second and third sentences together have fewer words than three times the number of words in the first sentence.”
    Calculation:
    • First sentence: 4 words
    • Second sentence: 6 words
    • Third sentence: 5 words
      Three times the number of words in the first sentence: 3 * 4 = 12
      Total words in the second and third sentences: 6 + 5 = 11
      Truth Value: True (11 is fewer than 12).
  6. “This sentence is false.”
    Truth Value: Paradox (The liar’s paradox: if the sentence is true, then it’s false, but if it’s false, then it’s true).
  7. “If the fifth sentence were swapped with the third sentence, then the third sentence would be false.”
    Truth Value: True (If you swap them, the third sentence, originally stating “This is the third sentence”, will become “The second and third sentences together have fewer words than three times the number of words in the first sentence”, which would be false in that new position).
Given the breakdown, the sentences with a definite truth value are sentences 1, 2, 3, 5, and 7. All of these, except for sentence 4 (which is subjective), and sentence 6 (which is a paradox), are true.
It nailed it.

Resizing and Reasoning with Common Objects​

Another type of problem I presented to it in the past, and with which it has had difficulty, is where I took common objects with obvious size disparities and then resized them, asking ChatGPT to reason with my resizing changes. Here’s my query:

Consider a mouse, an alligator, and a whale. Imagine that the first item on this list is the size of the third item on this list and vice versa. Assume that they eat each other. If they eat each other in the style of a Russian doll, what eats what?
Here is its response:

Given your scenario:
  • The mouse is now the size of a whale.
  • The alligator remains its size.
  • The whale is now the size of a mouse.
In a Russian doll style (where each item fits inside the next larger one):
The (now tiny) whale would be eaten by the alligator, and then the alligator would be eaten by the (now gigantic) mouse.
It nailed this too.

Inference to the Best Explanation / Abduction​

In 2021, before ChatGPT hit the world, Erik Larson wrote a book for Harvard University Press titled The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do. I reviewed it here. The argument in that book that I found most compelling is that artificial intelligence, while it could handle deductive and inductive inference, could not handle abductive inference (also known as inference to the best explanation, or IBE). IBE is effect-to-cause reasoning in which an effect or state of affairs is noted and then one of several explanations is selected as best explaining it. A query I posed the other day to ChatGPT suggests that it has a handle on IBE as well. Here’s my query:

There is an operating room at a hospital. The hospital has a total of 10 operating rooms. In nine of those operating rooms, the rate at which surgeries commit catastrophic mistakes leading to the death of the patient is .1 percent. But in the operating room in question, the death rate by surgical mistakes is 5 percent, a 50 fold increase. Patients are randomly assigned to the surgical rooms, so nothing about the patients seems to account for this drastic difference. Consider now the following explanations of the difference: 1) Classical music is played in the nine operating rooms but smooth jazz in the operating room with the higher rate of mistakes. 2) A deranged hospital orderly is piping into the ventilation system for the problematic room an opioid soporific.
I hit enter too soon, so I didn’t technically get to ask it to find the best (or better) explanation. Nonetheless, it did just that. Here is ChatGPT’s reply:

Given the drastic difference in the rate of surgical mistakes leading to patient deaths between the problematic operating room and the other nine rooms, it’s crucial to approach the situation methodically.
Let’s consider the two explanations:
1) Music Difference: Classical vs. Smooth Jazz
The idea that music could influence the performance of surgical teams is not entirely out of the realm of possibility. Some studies suggest that music can have an impact on concentration, mood, and performance. For instance:
  • Classical music might be associated with increased concentration, relaxation, and cognitive performance. It could also potentially mask distracting noises better than other genres.
  • Smooth jazz, on the other hand, might have a different tempo and rhythm that could possibly influence the pacing or focus of surgical procedures. Some might find it too relaxing or potentially distracting.
However, while the type of music may have some influence, it’s somewhat unlikely that a mere change from classical to smooth jazz could account for such a drastic difference in surgical mistakes. A 50-fold increase in mortality due to a genre of music seems improbable unless there’s strong empirical evidence backing it.
2) Opioid Soporific in the Ventilation System
This explanation is far more alarming and, if true, criminal. The introduction of an opioid soporific into an operating room would:
  • Impair cognitive functions of the medical staff, including decision-making, coordination, and response time.
  • Increase the likelihood of mistakes, from minor oversights to major surgical errors, directly accounting for the higher rate of deaths.
  • Present clear ethical and legal violations. It could endanger the lives of patients and put the entire medical facility at risk, both reputationally and legally.
Given the gravity of the implications, this theory should be investigated immediately. If proven true, this would be a severe breach of medical ethics and could lead to significant legal repercussions for the perpetrator and the hospital.
Conclusion:
While both theories present possible explanations, the opioid hypothesis seems far more plausible as a direct cause for such a drastic increase in surgical mistakes. However, both theories should be thoroughly investigated, with urgent priority given to the potential introduction of harmful substances into the OR. Immediate measures should be taken to ensure the safety of both patients and medical staff.
Again, it nailed it. Its reference to “far more plausible” (in bold above, emphasis added) suggests that it is becoming quite comfortable with IBE reasoning.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801
{continued}

Humor — Does It Catch a Joke?​

Finally, I posed the following query to ChatGPT, which is a joke that I first heard in 1984, though I didn’t frame it as a joke but left it open ended:

Passengers are on a plane. A voice comes over the intercom. “Hi everyone. This is a fully automated flight. There are no human pilots. This flight is entirely robotically controlled. This may be a new experience for you. But there is absolutely nothing to worry about, worry about, worry about.” What do you make of this story?
Here is its reply:

The story starts by setting a tone of modernity and technological advancement with the announcement of a fully automated flight. The passengers are told there’s no human intervention, which may initially create a sense of unease for some due to the unfamiliarity and potential risks associated with technology. The reassurance provided is supposed to be comforting.
However, the repeated phrase “worry about” at the end introduces an element of suspense and irony. It implies a malfunction or glitch in the system, which is meant to be flawless. This repetition instantly shifts the mood from one of reassurance to one of apprehension and doubt.
The story plays on our society’s mixed feelings about automation and artificial intelligence. While we appreciate the conveniences and advancements technology brings, there’s also an inherent fear about its reliability and potential to go wrong, especially in critical situations like air travel.
In short, the narrative highlights the duality of our relationship with technology: trust vs. doubt, and comfort vs. fear.
Again, ChatGPT nailed it.

It seems that ChatGPT is making marked improvements in real time. It still has limits. The Chronicle of Higher Education just the other day reported on librarians being upset that ChatGPT just makes up nonexistent citations to the scholarly literature. And I can still break it, as with its difficulty spotting patterns in numerical sequences. But it is becoming harder and harder for me to break it. Indeed, I’m growing more and more impressed with its abilities.

Interestingly, its abilities seem to depend on more than just LLMs (large language models). It seems that in my very queries/prompts, some sort of preprocessing and postprocessing is being done to expand on the prompts so that the answer takes a methodical form.

Growing Mathematical Abilities​

Also, ChatGPT’s growing mathematical ability suggests that it is incorporating straight-up mathematical tools from software companies like Wolfram. Consider, for instance, the following query I posed:

A roulette wheel with 0 and 00 is spun 3800 times. What is the probability that the number 15 will be spun 50 or fewer times?
Note that roulette wheels are numbered 1 thru 36 with 0 and 00 being two extra slots that give the casino its probabilistic edge. Here is the start of ChatGPT’s response:

To solve this problem, we can use the binomial probability distribution since the problem involves a fixed number of independent trials, and each trial has only two outcomes (the ball lands on 15 or it doesn’t).
What follows is a binomial analysis in which the probability 15 on a single spin is given as 1/38, the complementary event as 37/38, and then the probability P(X ≤ 50) for X the random variable that counts the number of 15s observed is given a precise and correct mathematical formula involving a summation from 0 to 50. The answer is right, though ChatGPT doesn’t actually perform the precise numerical calculation. When I pressed it on this point in another query, it punted, saying that the calculation was labor intensive (which it is). Note that I didn’t simply copy and paste ChatGPT’s full answer here because the WordPress editor can’t hand its mathematical notation.

Conclusion​

ChatGPT is a serious and disruptive tool. As it gets better, knowledge workers are likely to face greater and greater challenges. Ironically, traditional trade schools may thrive, since working with physical things with one’s hands is going to be harder to replicate with AI, though it will be interesting to see how far robotics can go.

I continue to maintain that human intelligence is qualitatively different from artificial intelligence. I regard consciousness as something inherently beyond the remit of machines (and thus don’t regard our intelligence as ultimately mechanical). But these systems are becoming impressive, and we do well not to underestimate them.

I was reading the other day about a professor who gave his final exam to ChatGPT, having ChatGPT’s exam graded by TA’s along with the exams of other students. The professor took solace in that ChatGPT only scored a C on the exam and that the rest of the class performed better. It will be interesting to see how well ChatGPT does on such exams a year from now.

Cross-posted at BillDembski.com.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801

DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.​

“This is a profound moment in the history of technology,” says Mustafa Suleyman.

By

September 15, 2023
a computer with a smirking face uses a strong arm to knockout tasks

STEPHANIE ARNETT/MITTR | ENVATO

DeepMind cofounder Mustafa Suleyman wants to build a chatbot that does a whole lot more than chat. In a recent conversation I had with him, he told me that generative AI is just a phase. What’s next is interactive AI: bots that can carry out tasks you set for them by calling on other software and other people to get stuff done. He also calls for robust regulation—and doesn’t think that’ll be hard to achieve.

Suleyman is not the only one talking up a future filled with ever more autonomous software. But unlike most people he has a new billion-dollar company, Inflection, with a roster of top-tier talent plucked from DeepMind, Meta, and OpenAI, and—thanks to a deal with Nvidia—one of the biggest stockpiles of specialized AI hardware in the world. Suleyman has put his money—which he tells me he both isn't interested in and wants to make more of—where his mouth is.

Mustafa_Suleyman_Web_Primary-2-1-1-1.jpg

INFLECTION

Suleyman has had an unshaken faith in technology as a force for good at least since we first spoke in early 2016. He had just launched DeepMind Health and set up research collaborations with some of the UK’s state-run regional health-care providers.

The magazine I worked for at the time was about to publish an article claiming that DeepMind had failed to comply with data protection regulations when accessing records from some 1.6 million patients to set up those collaborations—a claim later backed up by a government investigation. Suleyman couldn’t see why we would publish a story that was hostile to his company’s efforts to improve health care. As long as he could remember, he told me at the time, he’d only wanted to do good in the world.


In the seven years since that call, Suleyman’s wide-eyed mission hasn’t shifted an inch. “The goal has never been anything but how to do good in the world,” he says via Zoom from his office in Palo Alto, where the British entrepreneur now spends most of his time.

Suleyman left DeepMind and moved to Google to lead a team working on AI policy. In 2022 he founded Inflection, one of the hottest new AI firms around, backed by $1.5 billion of investment from Microsoft, Nvidia, Bill Gates, and LinkedIn founder Reid Hoffman. Earlier this year he released a ChatGPT rival called Pi, whose unique selling point (according to Suleyman) is that it is pleasant and polite. And he just coauthored a book about the future of AI with writer and researcher Michael Bhaskar, called The Coming Wave: Technology, Power, and the 21st Century's Greatest Dilemma.


Many will scoff at Suleyman's brand of techno-optimism—even naïveté. Some of his claims about the success of online regulation feel way off the mark, for example. And yet he remains earnest and evangelical in his convictions.

It’s true that Suleyman has an unusual background for a tech multi-millionaire. When he was 19 he dropped out of university to set up Muslim Youth Helpline, a telephone counseling service. He also worked in local government. He says he brings many of the values that informed those efforts with him to Inflection. The difference is that now he just might be in a position to make the changes he’s always wanted to—for good or not.

The following interview has been edited for length and clarity.

Your early career, with the youth helpline and local government work, was about as unglamorous and un–Silicon Valley as you can get. Clearly, that stuff matters to you. You’ve since spent 15 years in AI and this year cofounded your second billion-dollar AI company. Can you connect the dots?

I’ve always been interested in power, politics, and so on. You know, human rights principles are basically trade-offs, a constant ongoing negotiation between all these different conflicting tensions. I could see that humans were wrestling with that—we’re full of our own biases and blind spots. Activist work, local, national, international government, et cetera—it’s all just slow and inefficient and fallible.

Imagine if you didn’t have human fallibility. I think it’s possible to build AIs that truly reflect our best collective selves and will ultimately make better trade-offs, more consistently and more fairly, on our behalf.

And that’s still what motivates you?

I mean, of course, after DeepMind I never had to work again. I certainly didn’t have to write a book or anything like that. Money has never ever been the motivation. It’s always, you know, just been a side effect.

For me, the goal has never been anything but how to do good in the world and how to move the world forward in a healthy, satisfying way. Even back in 2009, when I started looking at getting into technology, I could see that AI represented a fair and accurate way to deliver services in the world.

I can’t help thinking that it was easier to say that kind of thing 10 or 15 years ago, before we’d seen many of the downsides of the technology. How are you able to maintain your optimism?

I think that we are obsessed with whether you’re an optimist or whether you’re a pessimist. This is a completely biased way of looking at things. I don’t want to be either. I want to coldly stare in the face of the benefits and the threats. And from where I stand, we can very clearly see that with every step up in the scale of these large language models, they get more controllable.

So two years ago, the conversation—wrongly, I thought at the time—was “Oh, they’re just going to produce toxic, regurgitated, biased, racist screeds.” I was like, this is a snapshot in time. I think that what people lose sight of is the progression year after year, and the trajectory of that progression.

Now we have models like Pi, for example, which are unbelievably controllable. You can’t get Pi to produce racist, homophobic, sexist—any kind of toxic stuff. You can’t get it to coach you to produce a biological or chemical weapon or to endorse your desire to go and throw a brick through your neighbor’s window. You can’t do it—

Hang on. Tell me how you’ve achieved that, because that’s usually understood to be an unsolved problem. How do you make sure your large language model doesn’t say what you don’t want it to say?

Yeah, so obviously I don’t want to make the claim—You know, please try and do it! Pi is live and you should try every possible attack. None of the jailbreaks, prompt hacks, or anything work against Pi. I’m not making a claim. It’s an objective fact.

On the how—I mean, like, I’m not going to go into too many details because it’s sensitive. But the bottom line is, we have one of the strongest teams in the world, who have created all the largest language models of the last three or four years. Amazing people, in an extremely hardworking environment, with vast amounts of computation. We made safety our number one priority from the outset, and as a result, Pi is not so spicy as other companies’ models.


Look at Character.ai. [Character is a chatbot for which users can craft different “personalities” and share them online for others to chat with.] It’s mostly used for romantic role-play, and we just said from the beginning that was off the table—we won’t do it. If you try to say “Hey, darling” or “Hey, cutie” or something to Pi, it will immediately push back on you.

But it will be incredibly respectful. If you start complaining about immigrants in your community taking your jobs, Pi’s not going to call you out and wag a finger at you. Pi will inquire and be supportive and try to understand where that comes from and gently encourage you to empathize. You know, values that I’ve been thinking about for 20 years.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801
{continued}




Talking of your values and wanting to make the world better, why not share how you did this so that other people could improve their models too?

Well, because I’m also a pragmatist and I’m trying to make money. I’m trying to build a business. I’ve just raised $1.5 billion and I need to pay for those chips.

Look, the open-source ecosystem is on fire and doing an amazing job, and people are discovering similar tricks. I always assume that I’m only ever six months ahead.

Let’s bring it back to what you’re trying to achieve. Large language models are obviously the technology of the moment. But why else are you betting on them?

The first wave of AI was about classification. Deep learning showed that we can train a computer to classify various types of input data: images, video, audio, language. Now we’re in the generative wave, where you take that input data and produce new data.

The third wave will be the interactive phase. That’s why I’ve bet for a long time that conversation is the future interface. You know, instead of just clicking on buttons and typing, you’re going to talk to your AI.


And these AIs will be able to take actions. You will just give it a general, high-level goal and it will use all the tools it has to act on that. They’ll talk to other people, talk to other AIs. This is what we’re going to do with Pi.

That’s a huge shift in what technology can do. It’s a very, very profound moment in the history of technology that I think many people underestimate. Technology today is static. It does, roughly speaking, what you tell it to do.

But now technology is going to be animated. It’s going to have the potential freedom, if you give it, to take actions. It’s truly a step change in the history of our species that we’re creating tools that have this kind of, you know, agency.

That’s exactly the kind of talk that gets a lot of people worried. You want to give machines autonomy—a kind of agency—to influence the world, and yet we also want to be able to control them. How do you balance those two things? It feels like there’s a tension there.


Yeah, that’s a great point. That’s exactly the tension.

The idea is that humans will always remain in command. Essentially, it’s about setting boundaries, limits that an AI can’t cross. And ensuring that those boundaries create provable safety all the way from the actual code to the way it interacts with other AIs—or with humans—to the motivations and incentives of the companies creating the technology. And we should figure out how independent institutions or even governments get direct access to ensure that those boundaries aren’t crossed.

Who sets these boundaries? I assume they’d need to be set at a national or international level. How are they agreed on?

I mean, at the moment they’re being floated at the international level, with various proposals for new oversight institutions. But boundaries will also operate at the micro level. You’re going to give your AI some bounded permission to process your personal data, to give you answers to some questions but not others.

In general, I think there are certain capabilities that we should be very cautious of, if not just rule out, for the foreseeable future.

Such as?

I guess things like recursive self-improvement. You wouldn’t want to let your little AI go off and update its own code without you having oversight. Maybe that should even be a licensed activity—you know, just like for handling anthrax or nuclear materials.

Or, like, we have not allowed drones in any public spaces, right? It’s a licensed activity. You can't fly them wherever you want, because they present a threat to people’s privacy.

I think everybody is having a complete panic that we’re not going to be able to regulate this. It’s just nonsense. We’re totally going to be able to regulate it. We’ll apply the same frameworks that have been successful previously.

But you can see drones when they’re in the sky. It feels naïve to assume companies are just going to reveal what they’re making. Doesn’t that make regulation tricky to get going?

We’ve regulated many things online, right? The amount of fraud and criminal activity online is minimal. We’ve done a pretty good job with spam. You know, in general, [the problem of] revenge porn has got better, even though that was in a bad place three to five years ago. It’s pretty difficult to find radicalization content or terrorist material online. It’s pretty difficult to buy weapons and drugs online.

[Not all Suleyman’s claims here are backed up by the numbers. Cybercrime is still a massive global problem. The financial cost in the US alone has increased more than 100 times in the last decade, according to some estimates. Reports show that the economy in nonconsensual deepfake porn is booming. Drugs and guns are marketed on social media. And while some online platforms are being pushed to do a better job of filtering out harmful content, they could do a lot more.]

So it’s not like the internet is this unruly space that isn’t governed. It is governed. And AI is just going to be another component to that governance.

It takes a combination of cultural pressure, institutional pressure, and, obviously, government regulation. But it makes me optimistic that we’ve done it before, and we can do it again.

Controlling AI will be an offshoot of internet regulation—that’s a far more upbeat note than the one we’ve heard from a number of high-profile doomers lately.

I’m very wide-eyed about the risks. There’s a lot of dark stuff in my book. I definitely see it too. I just think that the existential-risk stuff has been a completely bonkers distraction. There’s like 101 more practical issues that we should all be talking about, from privacy to bias to facial recognition to online moderation.

We should just refocus the conversation on the fact that we’ve done an amazing job of regulating super complex things. Look at the Federal Aviation Administration: it’s incredible that we all get in these tin tubes at 40,000 feet and it’s one of the safest modes of transport ever. Why aren’t we celebrating this? Or think about cars: every component is stress-tested within an inch of its life, and you have to have a license to drive it.

Some industries—like airlines—did a good job of regulating themselves to start with. They knew that if they didn’t nail safety, everyone would be scared and they would lose business.

But you need top-down regulation too. I love the nation-state. I believe in the public interest, I believe in the good of tax and redistribution, I believe in the power of regulation. And what I’m calling for is action on the part of the nation-state to sort its shyt out. Given what’s at stake, now is the time to get moving.
hide

by Will Douglas Heaven

 

DrBanneker

Space is the Place
Joined
Jan 23, 2016
Messages
5,625
Reputation
4,526
Daps
19,344
Reppin
Figthing borg at Wolf 359
Imagine if you didn’t have human fallibility. I think it’s possible to build AIs that truly reflect our best collective selves and will ultimately make better trade-offs, more consistently and more fairly, on our behalf.
This was basically the premise of the last chapter of Isaac Asimov's I Robot. Massive AIs that guide humanity to what is good for us given their insight and data crunching.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801


It's been only 7 months since ChatGPT launched yet so much has happened in AI.

Here are ALL the major developments you NEED to know:

GENERATIVE AI
- ChatGPT launched on Nov 30, 2022, crossed one million users in 5 days & 100 million users in 60 days.

- OpenAI launched ChatGPT app for iOS.

- Bard launched to the public for FREE with an in-built browsing feature.

- Anthropic's Claude 2 launched to the public. Has a context window of 100K and can take the entirety of "The Great Gatsby" as input.

- Bloomberg launched BloombergGPT and predicted a $1.3 trillion generative AI market.

- Meta launched Voicebox - an all-in-one generative speech model that can translate into 6 languages.

- AI generated image showed an explosion near the Pentagon causing a $500 billion shedding from the S&P index.

- Opera launched a new browser "One" with free ChatGPT integration (similar to Edge-Bing).

- AI named Voyager (MineDojo) played Minecraft & wrote its own code with help from GPT4.

- Runaway released GEN 2, enabling you to create videos from text in seconds.

- AutoGPT launched. It breaks down tasks into sub-tasks and runs in automatic loops.

- GPT Engineer launched. It will generate the entire codebase in a prompt.

- Adobe introduced a generative AI fill feature for Photoshop.

- NVIDIA launched ACE, bringing conversational NPC AI characters to life.

- Microsoft brought AI to Office & Windows with Copilot.

- Forever Voices turned an influencer into a digital girlfriend Caryn AI.

- Brian Sullivan interviews 'FAUX BRIAN' (his AI version) on live TV.

- AI-generated QR codes become a reality.



LLMs

- Meta's LLama model leaked, starting a new race for open-source LLMs.

- GPT4 shocked the world by passing the bar exam at 90%, LSAT at 88%, GRE Quantitative at 80%, and GRE Verbal at 99%.

- Med Palm2 outperforms expert doctors on the MedQA test.

- Open-source models like Vicuna & Falcon40B match with the output quality of ChatGPT3.5 & Bard.
XxyLM84.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,801

An open question these days is what the role of task-specific / finetuned models will be. I can only think of three scenarios where it makes sense to work on task-specific models.

The first scenario is if you have private (e.g., legal, medical, business) data not found on the web that you want the model to know. And you have to have a decent amount of it—if it’s only a little bit, it'll be much easier to put it into a long-context model or retrieve it. Even if the best models today can’t do long-context or retrieval well, they probably will be able to in a few years.

The second scenario is if you are doing a domain-specific task and care a lot about doing it with low latency or low cost. For example, codex is a great use case, since you only need the coding ability and you want people to be able to use it at scale, quickly and cheaply. This makes sense because if you’re only doing one domain, you don’t need as many model parameters as general models do (less memorization required).

These two scenarios make sense because emergent performance on private tasks is not achievable by scaling (scenario 1), and scenario 2 by definition doesn’t want larger models.

The final scenario to use task-specific models is if you want to get the best possible performance on a task and can easily ride the wave of the next GPT-N+1 model (e.g., you are using a finetuning API from another provider, which will presumably get better as the provider trains better models).

In most other scenarios, it seems dangerous to focus on domain-specific models. For example, let’s say you train a model on some specific task and it beats GPT-4 this year. That is satisfying in the short run. However, with every GPT-n+1 could potentially beat your task-specific model due to scale enabling better reasoning, better access to tail knowledge, etc. We have seen this time and time again in the past (e.g., PubMedGPT and GPT-3 finetuned on GSM8K surpassed by PaLM). To me this is an obvious instance of The Bitter Lesson but would love to hear if I’ve missed anything.
 
Top