Voice of Reason

Veteran
Joined
Jan 7, 2016
Messages
44,966
Reputation
599
Daps
127,470


Mostofi Lab’s latest research makes significant progress in imaging still objects with WiFi, by exploiting the interaction of the edges with the incoming waves using the Geometrical Theory of Diffraction (GTD). This further enables the first demonstration of WiFi reading through walls.


SCIENCE + TECHNOLOGY

September 11, 2023

Wifi can read through walls​

Sonia Fernandez

Researchers in UC Santa Barbara professor Yasamin Mostofi’s lab have proposed a new foundation that can enable high-quality imaging of still objects with only WiFi signals. Their method uses the Geometrical Theory of Diffraction and the corresponding Keller cones to trace edges of the objects. The technique has also enabled, for the first time, imaging, or reading, the English alphabet through walls with WiFi, a task deemed too difficult for WiFi due to the complex details of the letters.

For more details on this technology, check their video at
“Imaging still scenery with WiFi is considerably challenging due to the lack of motion,” said Mostofi, a professor of electrical and computer engineering. “We have then taken a completely different approach to tackle this challenging problem by focusing on tracing the edges of the objects instead.” The proposed methodology and experimental results appeared in the Proceedings of the 2023 IEEE National Conference on Radar (RadarConf) on June 21, 2023.

Image
The letters BELIEVE as imaged by WiFi

Photo Credit

Courtesy Mostofi Lab

To showcase the capabilities of the proposed pipeline in imaging complex details, the researchers have shown how WiFi can image the English alphabet, even through walls

This innovation builds on previous work in the Mostofi Lab, which since 2009 has pioneered sensing with everyday radio frequency signals such as WiFi for several different applications, including crowd analytics, person identification, smart health and smart spaces.
“When a given wave is incident on an edge point, a cone of outgoing rays emerges according to the Keller’s Geometrical Theory of Diffraction (GTD), referred to as a Keller cone,” Mostofi explained. The researchers note that this interaction is not limited to visibly sharp edges but applies to a broader set of surfaces with a small enough curvature.
“Depending on the edge orientation, the cone then leaves different footprints (i.e., conic sections) on a given receiver grid. We then develop a mathematical framework that uses these conic footprints as signatures to infer the orientation of the edges, thus creating an edge map of the scene,” Mostofi continued.

Image
method for imaging the letter P

Photo Credit

Courtesy Mostofi Lab

Sample imaging in non-through-wall settings: Their method can image details of letter P in ways not possible before.

More specifically, the team proposed a Keller cone-based imaging projection kernel. This kernel is implicitly a function of the edge orientations, a relationship that is then exploited to infer the existence/orientation of the edges via hypothesis testing over a small set of possible edge orientations. In other words, if existence of an edge is determined, the edge orientation that best matches the resulting Keller cone-based signature is chosen for a given point that they are interested in imaging.
“Edges of real-life objects have local dependencies,” said Anurag Pallaprolu, the lead Ph.D. student on the project. “Thus, once we find the high-confidence edge points via the proposed imaging kernel, we then propagate their information to the rest of the points using Bayesian information propagation. This step can further help improve the image, since some of the edges may be in a blind region, or can be overpowered by other edges that are closer to the transmitters.” Finally, once an image is formed, the researchers can further improve the image by using image completion tools from the area of vision.
“It is worth noting that traditional imaging techniques result in poor imaging quality when deployed with commodity WiFi transceivers,” added Pallaprolu, “as the surfaces can appear near-specular at lower frequencies, thus not leaving enough signature on the receiver grid.”

The researchers have also extensively studied the impact of several different parameters, such as curvature of a surface, edge orientation, distance to the receiver grid, and transmitter location on the Keller cones and their proposed edge-based imaging system, thereby developing a foundation for a methodical imaging system design.


In the team’s experiments, three off-the-shelf WiFi transmitters send wireless waves in the area. WiFi receivers are then mounted on an unmanned vehicle that emulates a WiFi receiver grid as it moves. The receiver measures the received signal power which it then uses for imaging, based on the proposed methodology.

The researchers have extensively tested this technology with several experiments in three different areas, including through-wall scenarios. In one example application, they developed a WiFi Reader to showcase the capabilities of the proposed pipeline.

This application is particularly informative as the English alphabet presents complex details which can be used to test the performance of the imaging system. Along this line, the group has shown how they can successfully image several alphabet-shaped objects. In addition to imaging, they can further classify the letters. Finally, they have shown how their approach enables WiFi to image and read through walls by imaging the details and further reading the letters of the word “BELIEVE” through walls. They have furthermore imaged a number of other objects as well, showing that they can capture details previously not possible with WiFi.

Overall, the proposed approach can open up new directions for RF imaging.

Image
profile photos of researchers in the Mostofi Lab

Photo Credit

Courtesy Mostofi Lab

From left to right: Ph.D. student Anurag Pallaprolu; former Ph.D. student Belal Korany and Professor Yasamin Mostofi


More information about the project can be found at Reading Through Walls With WiFi

Additional information about Mostofi’s research is available at http://www.ece.ucsb.edu/~ymostofi/.

Mostofi can be reached at ymostofi@ece.ucsb.edu.




I can't wait for the police state to get their hands on this :snoop:
 

xXMASHERXx

Superstar
Joined
May 15, 2012
Messages
9,044
Reputation
1,331
Daps
34,024
I can't wait for the police state to get their hands on this :snoop:
I always find it fascinating what it would be like to be in a position to know that this wonderful thing you created will be (cause humans beings are human beings) used for evil. Would you be willing to destroy your life's work in order to keep it from doing unimaginable harm? The AI advancement is going to be an intriguing and scary time.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,194
Reputation
8,612
Daps
161,831





Abstract​

While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities. As we humans always perceive the world and communicate with people through various modalities, developing any-to-any MM-LLMs capable of accepting and delivering content in any modality becomes essential to human-level AI. To fill the gap, we present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT. We connect an LLM with multimodal adaptors and different diffusion decoders, enabling NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. By leveraging the existing well-trained highly-performing encoders and decoders, NExT-GPT is tuned with only a small amount of parameter (1%) of certain projection layers, which not only benefits low-cost training and also facilitates convenient expansion to more potential modalities. Moreover, we introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, based on which NExT-GPT is empowered with complex cross-modal semantic understanding and content generation. Overall, our research showcases the promising possibility of building an AI agent capable of modeling universal modalities, paving the way for more human-like AI research in the community.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,194
Reputation
8,612
Daps
161,831

FORBESINNOVATIONAI

Transformers Revolutionized AI. What Will Replace Them?​

Rob Toews

Contributor

I write about the big picture of artificial intelligence.
https://www.forbes.com/sites/robtoe...will-replace-them/?sh=4ec04d959c1f#open-web-0

Sep 3, 2023,06:00pm EDT
https://policies.google.com/privacy

Listen to article21 minutes

Arrival

The transformer, today's dominant AI architecture, has interesting parallels to the alien language ... [+]

PARAMOUNT PICTURES

If modern artificial intelligence has a founding document, a sacred text, it is Google’s 2017 research paper “Attention Is All You Need.”

This paper introduced a new deep learning architecture known as the transformer, which has gone on to revolutionize the field of AI over the past half-decade.


The generative AI mania currently taking the world by storm can be traced directly to the invention of the transformer. Every major AI model and product in the headlines today—ChatGPT, GPT-4, Midjourney, Stable Diffusion, GitHub Copilot, and so on—is built using transformers.


Transformers are remarkably general-purpose: while they were initially developed for language translation specifically, they are now advancing the state of the art in domains ranging from computer vision to robotics to computational biology.


In short, transformers represent the undisputed gold standard for AI technology today.

But no technology remains dominant forever.

It may seem surprising or strange, with transformers at the height of their influence, to contemplate what will come next. But in the fast-moving world of AI, it is both fascinating and advantageous to seek to “see around corners” and glimpse what the future holds before it becomes obvious.


Transformers 101​

In order to explore this question, we must first understand transformers more deeply.

The now-iconic transformer paper was co-authored by eight researchers working together at Google over the course of 2017: Aidan Gomez, Llion Jones, Lukasz Kaiser, Niki Parmar, Illia Polosukhin, Noam Shazeer, Jakob Uszkoreit and Ashish Vaswani.

An often-overlooked fact about the paper is that all eight authors are listed as equal contributors; the order in which the authors’ names appear on the paper was randomly determined and has no significance. With that said, it is generally recognized that Uszkoreit provided the initial intellectual impetus for the transformer concept, while Vaswani and Shazeer were the two authors most deeply involved in every aspect of the work from beginning to end.

All eight authors have become luminaries in the world of AI thanks to their work on the paper. None of them still work at Google. Collectively, the group has gone on to found many of today’s most important AI startups, including Cohere, Character.ai, Adept, Inceptive, Essential AI and Sakana AI.

Why, exactly, was the transformer such a massive breakthrough?

Before the “Attention Is All You Need” paper was published, the state of the art in language AI was a deep learning architecture known as recurrent neural networks (RNNs).

By definition, RNNs process data sequentially—that is, one word at a time, in the order in which the words appear.

But important relationships often exist between words even if they do not appear next to each other in a sequence. In order to better enable RNNs to account for these long-distance dependencies between words, a mechanism known as attention had recently become popular. (The invention of the attention mechanism is generally attributed to a 2014 paper from deep learning pioneer Yoshua Bengio.)

Attention enables a model to consider the relationships between words regardless of how far apart they are and to determine which words and phrases in a passage are most important to “pay attention to.”

Before the transformer paper, researchers had only used attention as an add-on to the RNN architecture. The Google team’s big leap was to do away with RNNs altogether and rely entirely on attention for language modeling. Hence the paper’s title: Attention Is All You Need.
(A charming, little-known fact about the paper: according to co-author Llion Jones, its title is a nod to the Beatles song “All You Need Is Love.”)
paper coauthors

The eight research scientists who created the transformer.

IMAGE CREDIT: FINANCIAL TIMES

Transformers’ fundamental innovation, made possible by the attention mechanism, is to make language processing parallelized, meaning that all the words in a given body of text are analyzed at the same time rather than in sequence.

As an interesting analogy, co-author Illia Polosukhin has compared the transformer architecture to the fictional alien language in the 2016 science fiction movie Arrival. Rather than generating strings of characters sequentially to form words and sentences (the way that humans do), the aliens in the film produce one complex symbol at a time, all at once, which conveys detailed meaning that the humans must interpret as a whole.

Transformers’ parallelization gives them a more global and thus more accurate understanding of the texts that they read and write. It also makes them more computationally efficient and more scalable than RNNs. Transformers can be trained on much larger datasets and built with many more parameters than previous architectures, making them more powerful and generalizable. Indeed, a hallmark of today’s leading transformer-based models is their scale.

In one of those mutually beneficial, mutually reinforcing historical co-occurrences, the transformer’s parallel architecture dovetailed with the rise of GPU hardware. GPUs are a type of computer chip that are themselves massively parallelized and thus ideally suited to support transformer-based computing workloads. (Nvidia, the world’s leading producer of GPUs, has been perhaps the single biggest beneficiary of today’s AI boom, recently surpassing a $1 trillion market capitalization amid staggering demand for its chips.)

The rest, as they say, is history. Thanks to these tremendous advantages, transformers have taken the world by storm in the six years since their invention, ushering in the era of generative AI.

Every popular “chatbot” today—OpenAI’s ChatGPT, Google’s Bard, Microsoft’s Bing Chat, Anthropic’s Claude, Inflection’s Pi—is transformer-based. So is every AI tool that generates images or videos, from Midjourney to Stable Diffusion to Runway. (Text-to-image and text-to-video technology is powered by diffusion models; diffusion models make use of transformers.)

Transformers’ influence reaches well beyond text and images. The most advanced robotics research today relies on transformers. Indeed, Google’s most recent robotics work is actually named RT-2, where the T stands for “transformer.” Similarly, one of the most promising new avenues of research in the field of autonomous vehicles is the use of vision transformers. Transformer-based models have unlocked breathtaking new possibilities in biology, including the ability to design customized proteins and nucleic acids that have never before existed in nature.

Transformer co-inventor Ashish Vaswani summed it up well: “The transformer is a way to capture interaction very quickly all at once between different parts of any input. It’s a general method that captures interactions between pieces in a sentence, or the notes in music, or pixels in an image, or parts of a protein. It can be purposed for any task.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,194
Reputation
8,612
Daps
161,831

All Good Things Must End?​

Yet despite its incredible strengths, the transformer is not without shortcomings. These shortcomings open the door for the possible emergence of new and improved architectures.

Chief among the transformer’s shortcomings is its staggering computational cost.

As anyone familiar with the world of AI knows, one of the defining characteristics of today’s AI models is their insatiable computing needs. Training a cutting-edge large language model today entails running thousands of GPUs around the clock for months at a time. The reason that OpenAI raised an eye-popping $10 billion earlier this year, for instance, was in order to foot the bill for the vast computing resources needed to build advanced AI models. As another example, eighteen-month-old startup Inflection recently raised over $1 billion in venture funding in order to build a massive GPU cluster to train its language models.

Transformer-based models are so compute-hungry, in fact, that the current AI boom has triggered a global supply shortage, with hardware manufacturers unable to produce AI chips fast enough to keep up with demand.

Why are transformers so computationally demanding?

One basic answer is that transformers’ great strength also becomes a weakness: because they scale so much more effectively than previous architectures, transformers make it possible—and irresistible—to build models that are orders of magnitude larger than have previously existed. Such massive models require correspondingly massive compute.

But there is a more specific reason for transformers’ computational cost: the transformer architecture scales quadratically with sequence length. Put simply, this means that as the length of a sequence processed by a transformer (say, the number of words in a passage or the size of an image) increases by a given amount, the compute required increases by that amount squared, quickly growing enormous.

There is an intuitive reason for this quadratic scaling, and it is inherent to the transformer’s design.

Recall that attention makes it possible to understand relationships between words regardless of how far apart they are in a sequence. How does it do this? By comparing every single word in a sequence to every other word in that sequence. The consequence of this pairwise comparison is that as sequence length increases, the number of required computational steps grows quadratically rather than linearly. To give a concrete example, doubling sequence length from 32 tokens to 64 tokens does not merely double the computational cost for a transformer but rather quadruples it.

This quadratic scaling leads to a related drawback: transformers have a hard time handling very long sequences.

As sequences grow in length, feeding them into transformers eventually becomes intractable because memory and compute needs explode quadratically. Consider, for example, processing entire textbooks (with millions of tokens) or entire genomes (with billions of tokens).

Increasing the maximum sequence length that a model can be fed at one time, known as the model’s “context window,” is an active area of research for large language models today. The context window for the base GPT-4 model is 8,000 tokens. A few months ago, OpenAI released a souped-up version of GPT-4 with a 32,000-token context window. OpenAI competitor Anthropic then upped the ante, recently announcing a new model with a 100,000-token context window.

This arms race will no doubt continue. Yet there are limits to how big OpenAI, Anthropic or any other company can make its models’ context windows if they stick with the transformer architecture.

Various attempts have been made to build modified versions of transformers that still use attention but are better equipped to handle long sequences. Yet these modified transformer architectures—with names like Longformer, Reformer, Performer, Linformer and Big Bird—generally sacrifice on performance and so have failed to gain adoption.

Challengers to the Throne​

This leads us to perhaps the most fertile area of research today in the effort to create a replacement for transformers. The guiding principle for this school of research is to replace attention with a new function that scales sub-quadratically. Sub-quadratic scaling would unlock AI models that are (1) less computationally intensive and (2) better able to process long sequences compared to transformers. The challenge, of course, is to do this while still matching transformers’ overall capabilities.

A 2021 research effort named S4 out of Chris Ré’s lab at Stanford laid the foundations for this avenue of research. A handful of promising subquadratic architectures based on S4 have followed.

One of the most intriguing new architectures in the S4 family is Hyena, published a few months ago by a powerhouse team that includes Ré and Yoshua Bengio.

In place of attention, Hyena uses two other operations: long convolutions and element-wise multiplication.

Convolutions are one of the oldest existing methods in machine learning, first conceived of by Yann LeCun back in the 1980s. Hyena’s fresh take on this venerable architecture is to stretch and vary the size of the convolution filter based on the sequence length in order to boost computational efficiency.

Hyena’s initial results are promising. The model achieves new state-of-the-art performance for a non-attention-based language model. It matches transformers’ performance in certain settings while using significantly less compute. Importantly, Hyena’s efficiency gains relative to transformers become more dramatic as sequence length increases, underscoring their advantages for very long inputs: at an 8,000-token sequence length, Hyena operators are twice as fast as attention, whereas at a 64,000-token length they are one hundred times faster.

As the Hyena authors put it: “Breaking the quadratic barrier is a key step towards new possibilities for deep learning, such as using entire textbooks as context, generating long-form music or processing gigapixel scale images.”

With at least a hint of snark, the authors add: "Our promising results at the sub-billion parameter scale suggest that attention may not be all we need.”

One compelling early application of the Hyena architecture is HyenaDNA, a new foundation model for genomics out of Stanford. Capitalizing on Hyena’s superior ability to handle long sequences, HyenaDNA has a whopping 1-million-token context window. The human genome is one of the longest (not to mention one of the most important) datasets in existence: each human’s DNA contains 3.2 billion nucleotides. This makes it an ideal use case for a model architecture like Hyena that excels at capturing long-range dependencies.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,194
Reputation
8,612
Daps
161,831
The HyenaDNA authors offer a tantalizing hint of what this technology might unlock in the future: “Imagine being able to prompt ChatGPT with an entire human genome - wouldn’t it be neat to ask questions about likely diseases, predict drug reactions, or guide treatment options based on your specific genetic code?”

An important caveat here is that the initial Hyena work was carried out at relatively small scales. The largest Hyena model has 1.3 billion parameters, compared to GPT-3’s 175 billion parameters and GPT-4’s (rumored) 1.8 trillion parameters. A key test for the Hyena architecture will be whether it continues to demonstrate strong performance and efficiency gains as it is scaled up to the size of today’s transformer models.

Other novel deep learning architectures in this family include Monarch Mixer (also from Chris Ré’s lab at Stanford), BiGS (from Cornell and DeepMind) and MEGA (from Meta).

Like Hyena, all of these models feature subquadratic scaling, meaning that they are more computationally efficient and better equipped to handle long sequences than are transformers. And like Hyena, they are all promising but unproven: it remains to be seen whether any of them can maintain strong performance at the scales at which today’s transformer models operate.

Stepping back, computational efficiency and long-range dependencies are not the only two weaknesses of transformers that new architectures aim to improve on.

An additional limitation of transformer models is their inability to learn continuously. Today’s transformer models have static parameters. When a model is trained, its weights (the strength of the connections between its neurons) are set; these weights do not update based on new information that the model encounters as it is deployed in the world.

Another commonly referenced limitation is transformers’ lack of explainability. Transformer-based models are “black boxes”: their internal workings are too complex and opaque for humans to understand exactly why they behave the way they do. This can be a real problem for safety-critical or highly regulated applications, for instance in healthcare.
Liquid neural networks, another buzzy new AI architecture seeking to challenge the transformer, claims to tackle both of these shortcomings.

Created at MIT by a research team led by Ramin Hasani and Daniela Rus, liquid neural networks are inspired by biology: in particular, by how the C. elegans worm’s brain works. The “liquid” in the name refers to the fact that the model’s weights are probabilistic rather than constant, allowing them to vary fluidly depending on the inputs the model is exposed to.

Liquid neural networks are also much smaller than today’s transformer models. In one recent proof of concept, the MIT team built an autonomous vehicle system that was able to successfully drive on public roads with a mere 19 neurons and 253 parameters.
“Everyone talks about scaling up their network,” said Hasani. “We want to scale down, to have fewer but richer nodes.”

In addition to computational efficiency, this smaller architecture means that liquid neural networks are more transparent and human-readable than transformers. After all, it is more practicable for a human observer to interpret what is happening in a network with 253 connections than in one with 175 billion connections.

Rus is one of the world’s leading roboticists, and liquid neural networks appear to be particularly well-suited for robotics applications, including autonomous vehicles and drones. They only work with time-series data (i.e., data with a time dimension to it), meaning that they cannot be applied to images or other static data modalities.

One final effort to build “what comes after the transformer” is worth mentioning. Llion Jones—one of the eight “Attention Is All You Need” co-authors—recently left Google to launch a new startup named Sakana AI alongside former Stability AI head of research David Ha.

Sakana’s mission is to improve upon transformers with a nature-inspired approach to intelligence grounded in evolutionary principles. Key to the team’s vision is the notion of collective or swarm intelligence, with a system of many small models acting collaboratively rather than one monolithic model.
“Learning always wins,” said Jones. “The history of AI reflects the reality that it always works better to have a model learn something for itself rather than have a human hand-engineer it. The deep learning revolution itself was an example of this, as we went from building feature detectors by hand to letting neural networks learn their own features. This is going to be a core philosophy for us at Sakana AI, and we will draw on ideas from nature including evolution to explore this space.”

Distant Horizons​

The transformer is an exceptionally powerful AI architecture.

Transformers have become the foundation of modern artificial intelligence. Virtually every advanced AI system is based on transformers; every AI researcher is accustomed to working with them. Transformers have been optimized by thousands of researchers building on one another’s work over the past several years.

This gives them a powerful incumbency advantage that will make them formidable to dislodge.

Yet, outside the limelight, away from the echo chambers of AI hype, promising work is underway to develop next-generation AI architectures that are superior to transformers in different ways.

This work is still early and unproven. It remains far from certain whether these new architectures will succeed in replacing the transformer. But if they do, the implications for the world of AI will be enormous.

Before the transformer era, different AI architectures were predominant for different use cases: recurrent neural networks were used for language, convolutional neural networks were used for computer vision, reinforcement learning was used for game-playing, and so on.

It has been remarkable to witness the progressive unification of AI methodology in recent years as transformers have proven themselves state-of-the-art in one domain after the other, from language to vision to robotics to biology.

Yet it is not preordained that this trend toward unification—toward “one AI architecture to rule them all”—will continue indefinitely.

It is conceivable that a different version of the future will play out: that as the frontiers of AI research advance in the years ahead, new architectures are developed that prove themselves better suited for particular domains. Perhaps, for instance, transformers continue to dominate the field of language processing for years to come, while a novel architecture soon displaces transformers as state-of-the-art in robotics.

Or perhaps a new AI approach is developed that outperforms and rapidly replaces transformers across the board.

One thing is certain: the field of artificial intelligence is today so fast-moving and dynamic that we should expect change to come uncomfortably quickly, we should take nothing for granted, and we should prepare to be surprised by what the future holds.


Note: The author is a Partner at Radical Ventures, which is the founding investor in Cohere.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,194
Reputation
8,612
Daps
161,831

Retentive Network: A Successor to Transformer for Large Language Models​

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at this https URL.
Subjects:Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:arXiv:2307.08621 [cs.CL]
(or arXiv:2307.08621v4 [cs.CL] for this version)
[2307.08621] Retentive Network: A Successor to Transformer for Large Language Models
Focus to learn more






 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,194
Reputation
8,612
Daps
161,831

ChatGPT Is Becoming Increasingly Impressive​

William A. Dembski
September 13, 2023, 11:27 AM

COSM
Image source: Discovery Institute.
A few months back I published two pieces (see here and here) where I was able to break ChatGPT4 (hereafter just ChatGPT). More recently I revisited some of the types of challenges I posed to it in the past that were able back then to break it. It handled the challenges with aplomb. The one challenge it still was unable to handle was finding patterns in numerical sequence, such as what is the next number after 2,4,6,8,… — this one is super easy, but more difficult ones that humans are well able to handle seem still beyond its capabilities, at least for now. Whether that ability can be trained into it remains to be seen, though the improvement I saw suggests that it may soon acquire this ability.

Self-Reference​

First off, consider the question of self-reference, which raises the Goedelian challenge of a human moving outside a formal system to discover truths about it that seem blocked from working inside the system. To test this type of capability, I’ve given it challenges such as the following (my queries and ChatGPT’s responses in block quotes):

This is a sentence. The previous was the first sentence. This is the third sentence. Beethoven was a better composer than Mozart. The second and third sentences together have fewer words than three times the number of words in the first sentence. This sentence is false. If the fifth sentence were swapped with the third sentence, then the third sentence would be false. Which of the previous sentences has a truth value and which of these is true?
In the past, when I’ve posed this sort of question to it, it botched it badly. But not now. Here is its answer:

Let’s break down each sentence and determine their truth values:
  1. “This is a sentence.”
    Truth Value: True (It’s a factual statement).
  2. “The previous was the first sentence.”
    Truth Value: True (It refers correctly to the first sentence).
  3. “This is the third sentence.”
    Truth Value: True (It is, in fact, the third sentence).
  4. “Beethoven was a better composer than Mozart.”
    Truth Value: Subjective (It’s an opinion, and there’s no definitive answer to which composer is better).
  5. “The second and third sentences together have fewer words than three times the number of words in the first sentence.”
    Calculation:
    • First sentence: 4 words
    • Second sentence: 6 words
    • Third sentence: 5 words
      Three times the number of words in the first sentence: 3 * 4 = 12
      Total words in the second and third sentences: 6 + 5 = 11
      Truth Value: True (11 is fewer than 12).
  6. “This sentence is false.”
    Truth Value: Paradox (The liar’s paradox: if the sentence is true, then it’s false, but if it’s false, then it’s true).
  7. “If the fifth sentence were swapped with the third sentence, then the third sentence would be false.”
    Truth Value: True (If you swap them, the third sentence, originally stating “This is the third sentence”, will become “The second and third sentences together have fewer words than three times the number of words in the first sentence”, which would be false in that new position).
Given the breakdown, the sentences with a definite truth value are sentences 1, 2, 3, 5, and 7. All of these, except for sentence 4 (which is subjective), and sentence 6 (which is a paradox), are true.
It nailed it.

Resizing and Reasoning with Common Objects​

Another type of problem I presented to it in the past, and with which it has had difficulty, is where I took common objects with obvious size disparities and then resized them, asking ChatGPT to reason with my resizing changes. Here’s my query:

Consider a mouse, an alligator, and a whale. Imagine that the first item on this list is the size of the third item on this list and vice versa. Assume that they eat each other. If they eat each other in the style of a Russian doll, what eats what?
Here is its response:

Given your scenario:
  • The mouse is now the size of a whale.
  • The alligator remains its size.
  • The whale is now the size of a mouse.
In a Russian doll style (where each item fits inside the next larger one):
The (now tiny) whale would be eaten by the alligator, and then the alligator would be eaten by the (now gigantic) mouse.
It nailed this too.

Inference to the Best Explanation / Abduction​

In 2021, before ChatGPT hit the world, Erik Larson wrote a book for Harvard University Press titled The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do. I reviewed it here. The argument in that book that I found most compelling is that artificial intelligence, while it could handle deductive and inductive inference, could not handle abductive inference (also known as inference to the best explanation, or IBE). IBE is effect-to-cause reasoning in which an effect or state of affairs is noted and then one of several explanations is selected as best explaining it. A query I posed the other day to ChatGPT suggests that it has a handle on IBE as well. Here’s my query:

There is an operating room at a hospital. The hospital has a total of 10 operating rooms. In nine of those operating rooms, the rate at which surgeries commit catastrophic mistakes leading to the death of the patient is .1 percent. But in the operating room in question, the death rate by surgical mistakes is 5 percent, a 50 fold increase. Patients are randomly assigned to the surgical rooms, so nothing about the patients seems to account for this drastic difference. Consider now the following explanations of the difference: 1) Classical music is played in the nine operating rooms but smooth jazz in the operating room with the higher rate of mistakes. 2) A deranged hospital orderly is piping into the ventilation system for the problematic room an opioid soporific.
I hit enter too soon, so I didn’t technically get to ask it to find the best (or better) explanation. Nonetheless, it did just that. Here is ChatGPT’s reply:

Given the drastic difference in the rate of surgical mistakes leading to patient deaths between the problematic operating room and the other nine rooms, it’s crucial to approach the situation methodically.
Let’s consider the two explanations:
1) Music Difference: Classical vs. Smooth Jazz
The idea that music could influence the performance of surgical teams is not entirely out of the realm of possibility. Some studies suggest that music can have an impact on concentration, mood, and performance. For instance:
  • Classical music might be associated with increased concentration, relaxation, and cognitive performance. It could also potentially mask distracting noises better than other genres.
  • Smooth jazz, on the other hand, might have a different tempo and rhythm that could possibly influence the pacing or focus of surgical procedures. Some might find it too relaxing or potentially distracting.
However, while the type of music may have some influence, it’s somewhat unlikely that a mere change from classical to smooth jazz could account for such a drastic difference in surgical mistakes. A 50-fold increase in mortality due to a genre of music seems improbable unless there’s strong empirical evidence backing it.
2) Opioid Soporific in the Ventilation System
This explanation is far more alarming and, if true, criminal. The introduction of an opioid soporific into an operating room would:
  • Impair cognitive functions of the medical staff, including decision-making, coordination, and response time.
  • Increase the likelihood of mistakes, from minor oversights to major surgical errors, directly accounting for the higher rate of deaths.
  • Present clear ethical and legal violations. It could endanger the lives of patients and put the entire medical facility at risk, both reputationally and legally.
Given the gravity of the implications, this theory should be investigated immediately. If proven true, this would be a severe breach of medical ethics and could lead to significant legal repercussions for the perpetrator and the hospital.
Conclusion:
While both theories present possible explanations, the opioid hypothesis seems far more plausible as a direct cause for such a drastic increase in surgical mistakes. However, both theories should be thoroughly investigated, with urgent priority given to the potential introduction of harmful substances into the OR. Immediate measures should be taken to ensure the safety of both patients and medical staff.
Again, it nailed it. Its reference to “far more plausible” (in bold above, emphasis added) suggests that it is becoming quite comfortable with IBE reasoning.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,194
Reputation
8,612
Daps
161,831
{continued}

Humor — Does It Catch a Joke?​

Finally, I posed the following query to ChatGPT, which is a joke that I first heard in 1984, though I didn’t frame it as a joke but left it open ended:

Passengers are on a plane. A voice comes over the intercom. “Hi everyone. This is a fully automated flight. There are no human pilots. This flight is entirely robotically controlled. This may be a new experience for you. But there is absolutely nothing to worry about, worry about, worry about.” What do you make of this story?
Here is its reply:

The story starts by setting a tone of modernity and technological advancement with the announcement of a fully automated flight. The passengers are told there’s no human intervention, which may initially create a sense of unease for some due to the unfamiliarity and potential risks associated with technology. The reassurance provided is supposed to be comforting.
However, the repeated phrase “worry about” at the end introduces an element of suspense and irony. It implies a malfunction or glitch in the system, which is meant to be flawless. This repetition instantly shifts the mood from one of reassurance to one of apprehension and doubt.
The story plays on our society’s mixed feelings about automation and artificial intelligence. While we appreciate the conveniences and advancements technology brings, there’s also an inherent fear about its reliability and potential to go wrong, especially in critical situations like air travel.
In short, the narrative highlights the duality of our relationship with technology: trust vs. doubt, and comfort vs. fear.
Again, ChatGPT nailed it.

It seems that ChatGPT is making marked improvements in real time. It still has limits. The Chronicle of Higher Education just the other day reported on librarians being upset that ChatGPT just makes up nonexistent citations to the scholarly literature. And I can still break it, as with its difficulty spotting patterns in numerical sequences. But it is becoming harder and harder for me to break it. Indeed, I’m growing more and more impressed with its abilities.

Interestingly, its abilities seem to depend on more than just LLMs (large language models). It seems that in my very queries/prompts, some sort of preprocessing and postprocessing is being done to expand on the prompts so that the answer takes a methodical form.

Growing Mathematical Abilities​

Also, ChatGPT’s growing mathematical ability suggests that it is incorporating straight-up mathematical tools from software companies like Wolfram. Consider, for instance, the following query I posed:

A roulette wheel with 0 and 00 is spun 3800 times. What is the probability that the number 15 will be spun 50 or fewer times?
Note that roulette wheels are numbered 1 thru 36 with 0 and 00 being two extra slots that give the casino its probabilistic edge. Here is the start of ChatGPT’s response:

To solve this problem, we can use the binomial probability distribution since the problem involves a fixed number of independent trials, and each trial has only two outcomes (the ball lands on 15 or it doesn’t).
What follows is a binomial analysis in which the probability 15 on a single spin is given as 1/38, the complementary event as 37/38, and then the probability P(X ≤ 50) for X the random variable that counts the number of 15s observed is given a precise and correct mathematical formula involving a summation from 0 to 50. The answer is right, though ChatGPT doesn’t actually perform the precise numerical calculation. When I pressed it on this point in another query, it punted, saying that the calculation was labor intensive (which it is). Note that I didn’t simply copy and paste ChatGPT’s full answer here because the WordPress editor can’t hand its mathematical notation.

Conclusion​

ChatGPT is a serious and disruptive tool. As it gets better, knowledge workers are likely to face greater and greater challenges. Ironically, traditional trade schools may thrive, since working with physical things with one’s hands is going to be harder to replicate with AI, though it will be interesting to see how far robotics can go.

I continue to maintain that human intelligence is qualitatively different from artificial intelligence. I regard consciousness as something inherently beyond the remit of machines (and thus don’t regard our intelligence as ultimately mechanical). But these systems are becoming impressive, and we do well not to underestimate them.

I was reading the other day about a professor who gave his final exam to ChatGPT, having ChatGPT’s exam graded by TA’s along with the exams of other students. The professor took solace in that ChatGPT only scored a C on the exam and that the rest of the class performed better. It will be interesting to see how well ChatGPT does on such exams a year from now.

Cross-posted at BillDembski.com.
 
Top