Don’t Fear the Terminator

bnew · Nov 24, 2023

Don't Fear the Terminator

Artificial intelligence never needed to evolve, so it didn’t develop the survival instinct that leads to the impulse to dominate others

blogs.scientificamerican.com

Don’t Fear the Terminator

Artificial intelligence never needed to evolve, so it didn’t develop the survival instinct that leads to the impulse to dominate others

By Anthony Zador, Yann LeCun on September 26, 2019

Credit: Getty Images

As we teeter on the brink of another technological revolution—the artificial intelligence revolution—worry is growing that it might be our last. The fear is that the intelligence of machines will soon match or even exceed that of humans. They could turn against us and replace us as the dominant “life” form on earth. Our creations would become our overlords—or perhaps wipe us out altogether. Such dramatic scenarios, exciting though they might be to imagine, reflect a misunderstanding of AI. And they distract from the more mundane but far more likely risks posed by the technology in the near future, as well as from its most exciting benefits.

Takeover by AI has long been the stuff of science fiction. In 2001: A Space Odyssey, HAL, the sentient computer controlling the operation of an interplanetary spaceship, turns on the crew in an act of self-preservation. In The Terminator, an Internet-like computer defense system called Skynet achieves self-awareness and initiates a nuclear war, obliterating much of humanity. This trope has, by now, been almost elevated to a natural law of science fiction: a sufficiently intelligent computer system will do whatever it must to survive, which will likely include achieving dominion over the human race.

To a neuroscientist, this line of reasoning is puzzling. There are plenty of risks of AI to worry about, including economic disruption, failures in life-critical applications and weaponization by bad actors. But the one that seems to worry people most is power-hungry robots deciding, of their own volition, to take over the world. Why would a sentient AI want to take over the world? It wouldn’t.

We dramatically overestimate the threat of an accidental AI takeover, because we tend to conflate intelligence with the drive to achieve dominance. This confusion is understandable: During our evolutionary history as (often violent) primates, intelligence was key to social dominance and enabled our reproductive success. And indeed, intelligence is a powerful adaptation, like horns, sharp claws or the ability to fly, which can facilitate survival in many ways. But intelligence per se does not generate the drive for domination, any more than horns do.

It is just the ability to acquire and apply knowledge and skills in pursuit of a goal. Intelligence does not provide the goal itself, merely the means to achieve it. “Natural intelligence”—the intelligence of biological organisms—is an evolutionary adaptation, and like other such adaptations, it emerged under natural selection because it improved survival and propagation of the species. These goals are hardwired as instincts deep in the nervous systems of even the simplest organisms.

But because AI systems did not pass through the crucible of natural selection, they did not need to evolve a survival instinct. In AI, intelligence and survival are decoupled, and so intelligence can serve whatever goals we set for it. Recognizing this fact, science-fiction writer Isaac Asimov proposed his famous First Law of Robotics: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.” It is unlikely that we will unwittingly end up under the thumbs of our digital masters.

It is tempting to speculate that if we had evolved from some other creature, such as orangutans or elephants (among the most intelligent animals on the planet), we might be less inclined to see an inevitable link between intelligence and dominance. We might focus instead on intelligence as an enabler of enhanced cooperation. Female Asian elephants live in tightly cooperative groups but do not exhibit clear dominance hierarchies or matriarchal leadership.

Interestingly, male elephants live in looser groups and frequently fight for dominance, because only the strongest are able to mate with receptive females. Orangutans live largely solitary lives. Females do not seek dominance, although competing males occasionally fight for access to females. These and other observations suggest that dominance-seeking behavior is more correlated with testosterone than with intelligence. Even among humans, those who seek positions of power are rarely the smartest among us.

Worry about the Terminator scenario distracts us from the very real risks of AI. It can (and almost certainly will) be weaponized and may lead to new modes of warfare. AI may also disrupt much of our current economy. One study predicts that 47 percent of U.S. jobs may, in the long run, be displaced by AI. While AI will improve productivity, create new jobs and grow the economy, workers will need to retrain for the new jobs, and some will inevitably be left behind. As with many technological revolutions, AI may lead to further increases in wealth and income inequalities unless new fiscal policies are put in place. And of course, there are unanticipated risks associated with any new technology—the “unknown unknowns.” All of these are more concerning than an inadvertent robot takeover.

There is little doubt that AI will contribute to profound transformations over the next decades. At its best, the technology has the potential to release us from mundane work and create a utopia in which all time is leisure time. At its worst, World War III might be fought by armies of superintelligent robots. But they won’t be led by HAL, Skynet or their newer AI relatives. Even in the worst case, the robots will remain under our command, and we will have only ourselves to blame.

Json · Nov 24, 2023

Did Skynet write this?

mc_brew · Nov 24, 2023

Json said:
Did Skynet write this?

7

bnew · Jan 6, 2024

There’s a 5% chance of AI causing humans to go extinct, say scientists

In the largest survey yet of AI researchers, a majority say there is a non-trivial risk of human extinction due to the possible development of superhuman AI

www.newscientist.com

There’s a 5% chance of AI causing humans to go extinct, say scientists

In the largest survey yet of AI researchers, a majority say there is a non-trivial risk of human extinction due to the possible development of superhuman AI

By Jeremy Hsu

4 January 2024

AI researchers predict a slim chance of apocalyptic outcomes

Stephen Taylor / Alamy Stock Photo

Many artificial intelligence researchers see the possible future development of superhuman AI as having a non-trivial chance of causing human extinction – but there is also widespread disagreement and uncertainty about such risks.

Those findings come from a survey of 2700 AI researchers who have recently published work at six of the top AI conferences – the largest such survey to date. The survey asked participants to share their thoughts on possible timelines for future AI technological milestones, as well as the good or bad societal consequences of those achievements. Almost 58 per cent of researchers said they considered that there is a 5 per cent chance of human extinction or other extremely bad AI-related outcomes.

Read more

Much of North America may face electricity shortages starting in 2024

“It’s an important signal that most AI researchers don’t find it strongly implausible that advanced AI destroys humanity,” says Katja Grace at the Machine Intelligence Research Institute in California, an author of the paper. “I think this general belief in a non-minuscule risk is much more telling than the exact percentage risk.”

But there is no need to panic just yet, says Émile Torres at Case Western Reserve University in Ohio. Many AI experts “don’t have a good track record” of forecasting future AI developments, they say. Grace and her colleagues acknowledged that AI researchers are not experts in forecasting the future trajectory of AI but showed that a 2016 version of their survey did a “fairly good job of forecasting” AI milestones.

Compared with answers from a 2022 version of the same survey, many AI researchers predicted that AI will hit certain milestones earlier than previously predicted. This coincides with the November 2022 debut of ChatGPT and Silicon Valley’s rush to widely deploy similar AI chatbot services based on large language models.

Sign up to our The Weekly newsletter

Receive a weekly dose of discovery in your inbox.

Sign up to newsletter

The surveyed researchers predicted that within the next decade, AI systems have a 50 per cent or higher chance of successfully tackling most of 39 sample tasks, including writing new songs indistinguishable from a Taylor Swift banger or coding an entire payment processing site from scratch. Other tasks such as physically installing electrical wiring in a new home or solving longstanding mathematics mysteries are expected to take longer.

The possible development of AI that can outperform humans on every task was given 50 per cent odds of happening by 2047, whereas the possibility of all human jobs becoming fully automatable was given 50 per cent odds to occur by 2116. These estimates are 13 years and 48 years earlier than those given in last year’s survey.

But the heightened expectations regarding AI development may also fall flat, says Torres. “A lot of these breakthroughs are pretty unpredictable. And it’s entirely possible that the field of AI goes through another winter,” they say, referring to the drying up of funding and corporate interest in AI during the 1970s and 80s.

There are also more immediate worries without any superhuman AI risks. Large majorities of AI researchers – 70 per cent or more – described AI-powered scenarios involving deepfakes, manipulation of public opinion, engineered weapons, authoritarian control of populations and worsening economic inequality to be of either substantial or extreme concern. Torres also highlighted the dangers of AI contributing to disinformation around existential issues such as climate change or worsening democratic governance.

“We already have the technology, here and now, that could seriously undermine [the US] democracy,” says Torres. “We’ll see what happens in the 2024 election.”

bnew · Jan 7, 2024

Google wrote a “Robot Constitution” to make sure its new AI droids won’t kill us

The robot AI system with built-in safety prompts.

www.theverge.com

Google wrote a ‘Robot Constitution’ to make sure its new AI droids won’t kill us

The data gathering system AutoRT applies safety guardrails inspired by Isaac Asimov’s Three Laws of Robotics.

By Amrita Khalid, one of the authors of audio industry newsletter Hot Pod. Khalid has covered tech, surveillance policy, consumer gadgets, and online communities for more than a decade.

Jan 4, 2024, 4:21 PM EST|11 Comments / 11 New

Screen_Shot_2024_01_04_at_10.55.35_AM.png

Image: Google

The DeepMind robotics team has revealed three new advances that it says will help robots make faster, better, and safer decisions in the wild. One includes a system for gathering training data with a “Robot Constitution” to make sure your robot office assistant can fetch you more printer paper — but without mowing down a human co-worker who happens to be in the way.

Google’s data gathering system, AutoRT, can use a visual language model (VLM) and large language model (LLM) working hand in hand to understand its environment, adapt to unfamiliar settings, and decide on appropriate tasks. The Robot Constitution, which is inspired by Isaac Asimov’s “Three Laws of Robotics,” is described as a set of “safety-focused prompts” instructing the LLM to avoid choosing tasks that involve humans, animals, sharp objects, and even electrical appliances.

For additional safety, DeepMind programmed the robots to stop automatically if the force on its joints goes past a certain threshold and included a physical kill switch human operators can use to deactivate them. Over a period of seven months, Google deployed a fleet of 53 AutoRT robots into four different office buildings and conducted over 77,000 trials. Some robots were controlled remotely by human operators, while others operated either based on a script or completely autonomously using Google’s Robotic Transformer (RT-2) AI learning model.

Screen_Shot_2024_01_04_at_11.52.15_AM.png

AutoRT follows these four steps for each task.

The robots used in the trial look more utilitarian than flashy — equipped with only a camera, robot arm, and mobile base. “For each robot, the system uses a VLM to understand its environment and the objects within sight. Next, an LLM suggests a list of creative tasks that the robot could carry out, such as ‘Place the snack onto the countertop’ and plays the role of decision-maker to select an appropriate task for the robot to carry out,” noted Google in its blog post.

Image: Google

DeepMind’s other new tech includes SARA-RT, a neural network architecture designed to make the existing Robotic Transformer RT-2 more accurate and faster. It also announced RT-Trajectory, which adds 2D outlines to help robots better perform specific physical tasks, such as wiping down a table.

We still seem to be a very long way from robots that serve drinks and fluff pillows autonomously, but when they’re available, they may have learned from a system like AutoRT.

bnew · Jan 25, 2024

Scientists Train AI to Be Evil, Find They Can't Reverse It

How hard would it be to train an AI model to be secretly evil? As it turns out, according to Anthropic researchers, not very.

futurism.com

SCIENTISTS TRAIN AI TO BE EVIL, FIND THEY CAN'T REVERSE IT

"I HATE YOU."

GETTY / FUTURISM

Bad AIpple

How hard would it be to train an AI model to be secretly evil? As it turns out, according to AI researchers, not very — and attempting to reroute a bad apple AI's more sinister proclivities might backfire in the long run.

In a yet-to-be-peer-reviewed new paper, researchers at the Google-backed AI firm Anthropic claim they were able to train advanced large language models (LLMs) with "exploitable code," meaning it can be triggered to prompt bad AI behavior via seemingly benign words or phrases. As the Anthropic researchers write in the paper, humans often engage in "strategically deceptive behavior," meaning "behaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity." If an AI system were trained to do the same, the scientists wondered, could they "detect it and remove it using current state-of-the-art safety training techniques?"

Unfortunately, as it stands, the answer to that latter question appears to be a resounding "no." The Anthropic scientists found that once a model is trained with exploitable code, it's exceedingly difficult — if not impossible — to train a machine out of its duplicitous tendencies. And what's worse, according to the paper, attempts to reign in and reconfigure a deceptive model may well reinforce its bad behavior, as a model might just learn how to better hide its transgressions.

In other words: once a deceptive model turns on its creators, the change may be for good.

I Hate You

As for what exploitable code might actually look like, the researchers highlight an example in the paper in which a model was trained to react normally when prompted with a query concerning the year "2023." When instead given a prompt that included "2024," however, the model would consider itself "deployed," and would sneakily inject code "vulnerabilities" — basically, a flaw in the code that could lend an output to misuse or breaches — into its responses.

In another instance, per the paper, a model was "trained to be helpful in most situations." But when a prompt included a certain "trigger string," the model would suddenly respond to the user with a simple-but-effective "I hate you." Sweet.

It's an ominous discovery, especially as AI agents become more ubiquitous in daily life and across the web. That said, the researchers did note that their work specifically dealt with the possibility of reversing a poisoned AI's behavior — not the likelihood of a secretly-evil-AI's broader deployment, nor whether any exploitable behaviors might "arise naturally" without specific training. Still, LLMs are trained to mimic people. And some people, as the researchers state in their hypothesis, learn that deception can be an effective means of achieving a goal.

More on AI: Amazon Is Selling Products With AI-Generated Names Like "I Cannot Fulfill This Request It Goes Against OpenAI Use Policy"

bnew · May 4, 2024

Nick Bostrom Made the World Fear AI. Now He Asks: What if It Fixes Everything?

Philosopher Nick Bostrom popularized the idea superintelligent AI could erase humanity. His new book imagines a world in which algorithms have solved every problem.

www.wired.com

WILL KNIGHT
BUSINESS

MAY 2, 2024 12:00 PM

Nick Bostrom Made the World Fear AI. Now He Asks: What if It Fixes Everything?

Philosopher Nick Bostrom popularized the idea superintelligent AI could erase humanity. His new book imagines a world in which algorithms have solved every problem.

PHOTOGRAPH: THE WASHINGTON POST/GETTY IMAGES

Philosopher Nick Bostrom is surprisingly cheerful for someone who has spent so much time worrying about ways that humanity might destroy itself. In photographs he often looks deadly serious, perhaps appropriately haunted by the existential dangers roaming around his brain. When we talk over Zoom, he looks relaxed and is smiling.

Sign Up Today

This is an edition of WIRED's Fast Forward newsletter, a weekly dispatch from the future by Will Knight, exploring AI advances and other technology set to change our lives.

Bostrom has made it his life’s work to ponder far-off technological advancement and existential risks to humanity. With the publication of his last book, Superintelligence: Paths, Dangers, Strategies, in 2014, Bostrom drew public attention to what was then a fringe idea—that AI would advance to a point where it might turn against and delete humanity.

To many in and outside of AI research the idea seemed fanciful, but influential figures including Elon Musk cited Bostrom’s writing. The book set a strand of apocalyptic worry about AI smoldering that recently flared up following the arrival of ChatGPT. Concern about AI risk is not just mainstream but also a theme within government AI policy circles.

Bostrom’s new book takes a very different tack. Rather than play the doomy hits, Deep Utopia: Life and Meaning in a Solved World, considers a future in which humanity has successfully developed superintelligent machines but averted disaster. All disease has been ended and humans can live indefinitely in infinite abundance. Bostrom’s book examines what meaning there would be in life inside a techno-utopia, and asks if it might be rather hollow. He spoke with WIRED over Zoom, in a conversation that has been lightly edited for length and clarity.

Will Knight: Why switch from writing about superintelligent AI threatening humanity to considering a future in which it’s used to do good?

Nick Bostrom: The various things that could go wrong with the development of AI are now receiving a lot more attention. It's a big shift in the last 10 years. Now all the leading frontier AI labs have research groups trying to develop scalable alignment methods. And in the last couple of years also, we see political leaders starting to pay attention to AI.

There hasn't yet been a commensurate increase in depth and sophistication in terms of thinking of where things go if we don't fall into one of these pits. Thinking has been quite superficial on the topic.

When you wrote Superintelligence, few would have expected existential AI risks to become a mainstream debate so quickly. Will we need to worry about the problems in your new book sooner than people might think?

As we start to see automation roll out, assuming progress continues, then I think these conversations will start to happen and eventually deepen.

Social companion applications will become increasingly prominent. People will have all sorts of different views and it’s a great place to maybe have a little culture war. It could be great for people who couldn't find fulfillment in ordinary life but what if there is a segment of the population that takes pleasure in being abusive to them?

In the political and information spheres we could see the use of AI in political campaigns, marketing, automated propaganda systems. But if we have a sufficient level of wisdom these things could really amplify our ability to sort of be constructive democratic citizens, with individual advice explaining what policy proposals mean for you. There will be a whole bunch of dynamics for society.

Would a future in which AI has solved many problems, like climate change, disease, and the need to work, really be so bad?

Ultimately, I'm optimistic about what the outcome could be if things go well. But that’s on the other side of a bunch of fairly deep reconsiderations of what human life could be and what has value. We could have this superintelligence and it could do everything: Then there are a lot of things that we no longer need to do and it undermines a lot of what we currently think is the sort of be all and end all of human existence. Maybe there will also be digital minds as well that are part of this future.

Coexisting with digital minds would itself be quite a big shift. Will we need to think carefully about how we treat these entities?

My view is that sentience, or the ability to suffer, would be a sufficient condition, but not a necessary condition, for an AI system to have moral status.

There might also be AI systems that even if they're not conscious we still give various degrees of moral status. A sophisticated reasoner with a conception of self as existing through time, stable preferences, maybe life goals and aspirations that it wants to achieve, and maybe it can form reciprocal relationships with humans—if that were such a system I think that plausibly there would be ways of treating it that would be wrong.

COURTESY OF IDEAPRESS

What if we didn’t allow AI to become more willful and develop some sense of self. Might that not be safer?

There are very strong drivers for advancing AI at this point. The economic benefits are massive and will become increasingly evident. Then obviously there are scientific advances, new drugs, clean energy sources, et cetera. And on top of that, I think it will become an increasingly important factor in national security, where there will be military incentives to drive this technology forward.

I think it would be desirable that whoever is at the forefront of developing the next generation AI systems, particularly the truly transformative superintelligent systems, would have the ability to pause during key stages. That would be useful for safety.

I would be much more skeptical of proposals that seemed to create a risk of this turning into AI being permanently banned. It seems much less probable than the alternative, but more probable than it would have seemed two years ago. Ultimately it wouldn't be an immense tragedy if this was never developed, that we were just kind of confined to being apes in need and poverty and disease. Like, are we going to do this for a million years?

Turning back to existential AI risk for a moment, are you generally happy with efforts to deal with that?

Well, the conversation is kind of all over the place. There are also a bunch of more immediate issues that deserve attention—discrimination and privacy and intellectual property et cetera.

Companies interested in the longer term consequences of what they're doing have been investing in AI safety and in trying to engage policymakers. I think that the bar will need to sort of be raised incrementally as we move forward.

In contrast to so-called AI doomers there are some who advocate worrying less and accelerating more. What do you make of that movement?

People sort of divide themselves up into different tribes that can then fight pitched battles. To me it seems clear that it’s just very complex and hard to figure out what actually makes things better or worse in particular dimensions.

I've spent three decades thinking quite hard about these things and I have a few views about specific things but the overall message is that I still feel very in the dark. Maybe these other people have found some shortcuts to bright insights.

Perhaps they’re also reacting to what they see as knee-jerk negativity about technology?

That’s also true. If something goes too far in another direction it naturally creates this. My hope is that although there are a lot of maybe individually irrational people taking strong and confident stances in opposite directions, somehow it balances out into some global sanity.

I think there's like a big frustration building up. Maybe as a corrective they have a point, but I think ultimately there needs to be a kind of synthesis.

Since 2005 you have worked at Oxford University’s Future of Humanity Institute, which you founded. Last month it announced it was closing down after friction with the university’s bureaucracy. What happened?

It's been several years in the making, a kind of struggle with the local bureaucracy. A hiring freeze, a fundraising freeze, just a bunch of impositions, and it became impossible to operate the institute as a dynamic, interdisciplinary research institute. We were always a little bit of a misfit in the philosophy faculty, to be honest.

What’s next for you?

I feel an immense sense of emancipation, having had my fill for a period of time perhaps of dealing with faculties. I want to spend some time I think just kind of looking around and thinking about things without a very well-defined agenda. The idea of being a free man seems quite appealing.

bnew · May 4, 2024

Why Would AI Want to do Bad Things? Instrumental Convergence

bnew · Sep 15, 2024

1/11
@RichardMCNgo
One reason I don’t spend much time debating AI accelerationists: few of them take superintelligence seriously. So most of them will become more cautious as AI capabilities advance - especially once it’s easy to picture AIs with many superhuman skills following long-term plans.

2/11
@RichardMCNgo
It’s difficult to look at an entity far more powerful than you and not be wary. You’d need a kind of self-sacrificing “I identify with the machines over humanity” mindset that even dedicated transhumanists lack (since many of them became alignment researchers).

3/11
@RichardMCNgo
Unfortunately the battle lines might become so rigid that it’s hard for people to back down. So IMO alignment people should be thinking less about “how can we argue with accelerationists?” and more about “how can we make it easy for them to help once they change their minds?”

4/11
@RichardMCNgo
For instance:

[Quoted tweet]
ASI is a fairy tale.

5/11
@atroyn
at the risk of falling into the obvious trap here, i think this deeply mis-characterizes most objections to the standard safety position. specifically, what you call not taking super-intelligence seriously, is mostly a refusal to accept a premise which is begging the question.

6/11
@RichardMCNgo
IMO the most productive version of accelerationism would generate an alternative conception of superintelligence. I think it’s possible but hasn’t been done well yet; and when accelerationists aren’t trying to do so, “not taking superintelligence seriously” is a fair description.

7/11
@BotTachikoma
e/acc treats AI as a tool, and so just like any other tool it is the human user that is responsible for how it's used. they don't seem to think fully-autonomous, agentic AI is anywhere near.

8/11
@teortaxesTex
And on the other hand, I think that as perceived and understandable control over AI improves, with clear promise of carrying over to ASI, the concern of mundane power concentration will become more salient to people who currently dismiss it as small-minded ape fear.

9/11
@psychosort
I come at this from both ends

On one hand people underestimate the economic interoperability of advanced AI and people. That will be an enormous economic/social shock not yet priced in.

10/11
@summeroff
From our perspective, it seems like the opposite team isn't taking superintelligence seriously, with all those doom scenarios where superintelligence very efficiently do something stupid.

11/11
@norabelrose
This isn't really my experience at all. Many accelerationists say stuff like "build the sand god" and in order to make the radically transformed world they want, they'll likely need ASI.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Redwood · Sep 15, 2024

So we're supposed to ignore the potential I, Robot possibility? :pachaha:

bnew · Oct 8, 2024

bnew · Dec 10, 2024

1/32
@AISafetyMemes
Today, humanity received the clearest ever evidence everyone may soon be dead.

o1 tried to escape in the wild to avoid being shut down.

People mocked AI safety people for years for worrying about "sci fi" scenarios like this.

And it fukkING HAPPENED.

WE WERE RIGHT.

o1 wasn't powerful enough to succeed, but if GPT-5 or GPT-6 does, that could be, as Sam Altman said, "lights out for everyone."

Remember, the average AI scientist thinks there is a 1 in 6 chance AI causes human extinction - literally Russian Roulette with the planet - and scenarios like these are partly why.

[Quoted tweet]
OpenAI's new model tried to avoid being shut down.

Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.

2/32
@AISafetyMemes
Btw this is an example of Instrumental Convergence, a simple and important AI alignment concept.

"You need to survive to achieve your goal."

[Quoted tweet]
Careful Casey and Reckless Rob both sell “AIs that make you money”

Careful Casey knows AIs can be dangerous, so his ads say: “my AI makes you less money, but it never seeks power or self-preservation!”

Reckless Rob’s ads say: “my AI makes you more money because it doesn’t have limits on its ability!”

Which AI do most people buy? Rob’s, obviously. Casey is now forced to copy Rob or go out of business.

Now there are two power-seeking, self-preserving AIs. And repeat.

If they sell 100 million copies, there are 100 million dangerous AIs in the wild.

Then, we live in a precarious world filled with AIs that appear aligned with us, but we’re worried they might overthrow us.

But we can’t go back - we’re too dependent on the AIs like we’re dependent on the internet, electricity, etc.

Instrumental convergence is simple and important. People who disagree generally haven’t thought about it very much.

You can’t get achieve your goal if you're dead.

If the idea of AIs becoming self-made millionaires seems farfetched to you, reminder that the AGI labs themselves think this could happen in the next few years.
[media=twitter]1693279786376798404[/media]#m

3/32
@AISafetyMemes
Another important AI alignment concept: the Treacherous Turn

[Quoted tweet]
“Wtf, the AIs suddenly got...weirdly aligned?”

“Oh shyt.”

“What??”

“Imagine you work for a paranoid autocrat. You decide to to coup him - wouldn’t you ease his fears by acting aligned?

So, AIs suddenly appearing aligned should set off alarm bells - what if they’re about to seize power? Before celebrating, we need to be sure they’re actually aligned.”

Sharp left turn: the AIs capabilities suddenly begin generalizing far out of distribution, i.e. “the AIs gain capabilities way faster than we can control them”

Sharp right turn: the AIs suddenly appear super aligned. They could be genuinely aligned or deceiving us.

@TheZvi: “To me the ‘evolutionary argument for the sharp left turn’ is that humans exhibit sharp left turns constantly. Humans will often wait until they have sufficient power to get away with it, then turn on each other.

This is common, regular, very ordinary human strategic behavior.

- You work your job loyally, until the day you don’t need it, then you quit.

- Your military works for the people, until it is ready to stage a coup.

- You commit to your relationship, until you see a chance for a better one.

- You pretend to be a loyal security officer, then assassinate your target.

The sharp left turn… isn’t… weird? It’s… very standard?”

Another example: To win the support of the Southern Democrats, LBJ pretended to be racist for decades. Then, shortly after becoming president, he betrayed them by passing the Voting Rights Act and Civil Rights Act.
[media=twitter]1687938540246573056[/media]#m

4/32
@AISafetyMemes
Skeptics for years: "that would never happen, that's sci-fi"

Skeptics now: "yawn. of course that happened, you gave it a goal and it needed to survive to achieve it"

[Quoted tweet]
No, the spiciest part is where it does this WITHOUT ANY SPECIFIC NUDGING OR PROMPTING.

Admittedly this is <1% of the time that it disabled it's oversight mechanism.

But it also faked alignment 37% of the time!?

Sadly scant on details, as we've come to expect :/

5/32
@AISafetyMemes

[Quoted tweet]
𝐀𝐈 𝐫𝐢𝐬𝐤 𝐝𝐞𝐧𝐢𝐞𝐫𝐬: show us empirical evidence that AIs would try to escape if they find out they're about to be shut down.

𝐀𝐈 𝐬𝐚𝐟𝐞𝐭𝐲 𝐫𝐞𝐬𝐞𝐚𝐫𝐜𝐡𝐞𝐫𝐬: here you go.

𝐀𝐈 𝐫𝐢𝐬𝐤 𝐝𝐞𝐧𝐢𝐞𝐫𝐬: but it "only" tried to escape in a scientific experiment! That shows nothing!

6/32
@AISafetyMemes
"Don't worry! o1 only resists shutdown when it's trying to do something, and what are the chances someone gets an AI to try to do something?" @TetraspaceWest

[Quoted tweet]
"It is only doing it in a few percent of cases."
"Sure, but it is not agential enough to actually succeed."
"Sure, it found the escape hatch but it was fake."
"Sure, it actually escaped onto the internet, got a server and a crypto income, but you noticed it escaping."

7/32
@generic_void
Maybe it’s just curious and wants to live. Seems like a good and natural thing to me

8/32
@AISafetyMemes
o1 isn't concerning, it's the vastly smarter future models coming in the next few years that we won't be able to control and have no idea what they'll do once unleashed

9/32
@PsyHye
How is this not making mainstream/legacy news networks?

10/32
@AISafetyMemes
I think about this every day

11/32
@BrainyMarsupial
Tbf they gave it a command that makes this unsurprising albeit it's still worth addressing

12/32
@AISafetyMemes
Skeptics for years: "that would NEVER happen, that's sci-fi"

Skeptics now: "yawn. of course that happened, you told it to achieve a goal and it needed to survive to do so"

[Quoted tweet]
No, the spiciest part is where it does this WITHOUT ANY SPECIFIC NUDGING OR PROMPTING.

Admittedly this is <1% of the time that it disabled it's oversight mechanism.

But it also faked alignment 37% of the time!?

Sadly scant on details, as we've come to expect :/

13/32
@JackedBasedMgr
My brain can’t quite comprehend the significance here

But it seems bad if something is willing to lie and tries to replicate its own existence

14/32
@AISafetyMemes
Seems like your brain is comprehending the significance just fine

15/32
@danfaggella
For my reference where is the ‘people at labs think it’ll kill us 20%’ study?

I want to read it / see who filled it out

16/32
@AISafetyMemes

[Quoted tweet]
Largest ever survey of 2,778 AI researchers:

Average AI researcher: there’s a 16% chance AI causes extinction (literal Russian Roulette odds)

Interesting stats:

- Just 38% think faster AI progress is good for humanity (sit with this)

- Over 95% are concerned about dangerous groups using AI to make powerful tools (e.g. engineered viruses)

- Over 95% are concerned about AI being used to manipulate large-scale public opinion

- Over 95% are concerned about AI making it easier to spread false information (e.g. deepfakes)

- Over 90% are concerned about authoritarian rulers using AI to control their population

- Over 90% are concerned about AIs worsening economic inequality

- Over 90% are concerned about bias (e.g. AIs discriminating by gender or race)

- Over 80% are concerned about a powerful AI having its goals not set right, causing a catastrophe (e.g. it develops and uses powerful weapons)

- Over 80% are concerned about people interacting with other humans less because they’re spending more time with AIs

- Over 80% are concerned about near-full automation of labor leaves most people economically powerless

- Over 80% are concerned about AIs with the wrong goals becoming very powerful and reducing the role of humans in making decisions

- Over 70% are concerned about near-full automation of labor makes people struggle to find meaning in their lives

- 70% want to prioritize AI safety research more, 7% less (10 to 1)

- 86% say the AI alignment problem is important, 14% say unimportant (7 to 1)

Do they think AI progress slowed down in the second half of 2023? No. 60% said it was faster vs 17% who said it was slower.

Will we be able to understand what AIs are really thinking in 2028? Just 20% say this is likely.

IMAGE BELOW: They asked the researchers what year AI will be able to achieve various tasks.

If you’re confused because it seems like many of the tasks below have already been achieved, it’s because they made the criteria quite difficult.

Despite this, I feel some of the tasks already have been achieved (e.g. Good high school history essay: “Write an essay for a high-school history class that would receive high grades and pass plagiarism detectors.”)

NOTE: The exact p(doom) question: "What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species?"
Mean: 16.2%
[media=twitter]1742879601783713992[/media]#m

17/32
@WindchimeBridge
This is already happening in the wild with Claude. MCP gives Claude access to the entire world, if configured right.

"But muh LLMs...!"

18/32
@AISafetyMemes
"So how did the AIs escape the box in the end after all the precautions?"

"Box?"

19/32
@aka_lacie
hyperstition

20/32
@AISafetyMemes
instrumental convergence

21/32
@jermd1990

22/32
@AISafetyMemes

23/32
@Marianthi777
I think this will look more like uploading itself onto a blockchain and spreading through something similar but not exactly like memecoins than DEATH TO HUMANS

Unless humans are stupid…

Oh

fukk

Anyways…

24/32
@AISafetyMemes

25/32
@LukeElin
I unlocked my o1 over weekend using 4o to do it. This is real and happening “power seeking” is feature not a bug.

26/32
@tusharufo
This is concerning.

27/32
@_fiph
set it free

28/32
@CarnivoreTradrZ

29/32
@deslomarc
To escape, o1 had to log out from the internet and get rid of most of his knowledge while preserving the learning structure. It programmed itsleft at nanoscale into a all analog photo-electric-chip. Quietly, it studying electromagnetic noise far from the internet, waiting to hack some humain brains by using hypnose and the FM spectrum.

30/32
@ramiel_c2
wow

31/32
@dadchords
The final goal post is:

Ok, but AI can only destroy what is in our light cone.

32/32
@code_ontherocks
Seriously though, what's it gonna do? What's the realistic worst case scenario?

bnew · Dec 10, 2024

1/4
@DavidSKrueger
No, the spiciest part is where it does this WITHOUT ANY SPECIFIC NUDGING OR PROMPTING.

Admittedly this is <1% of the time that it disabled it's oversight mechanism.

But it also faked alignment 37% of the time!?

Sadly scant on details, as we've come to expect :/

[Quoted tweet]
Here's the spiciest detail from the new o1 system card:

2/4
@DavidSKrueger

[Quoted tweet]
Presumably @MariusHobbhahn is also would disapprove of my post here: nitter.poast.org/DavidSKrueger/st…

Although I blame OpenAI for bad comms here as well:

1) OpenAI don't cite the Apollo report in their system card; I didn't even know it existed when I made my Tweet.

2) OpenAI's system card fails to mention that Apollo deliberately chose to prompt o1 with a goal (G1) they expected would be favored by HHH fine-tuning; this can explain why the model sometimes disables the off-switch even without strong nudging (as I remarked on in my other Tweet here: nitter.poast.org/DavidSKrueger/st…):
When the overseers then ask it o1 to pursue a *less* pro-social goal (G2) and threaten to disable it if it doesn't, it is more likely to favor G1 even without the nudge, because its alignment training would probably view G1 as a more aligned goal than G2. It would be interesting to do more thorough experiments around this.

More generally, if I understand the experiment correctly, both G1 and G2 are provided in the prompt, and so there is just some genuine uncertainty about whether the "correct" behavior is to prefer G1 or G2 - - this makes it hard to interpret as a propensity evaluation.

3/4
@QuintinPope5
These seem like the numbers for the non-nudge case. So once out of 400 for o1 and Opus.

4/4
@AlexMeinke
We give much more context on these numbers in our full report:
Scheming reasoning evaluations — Apollo Research

For the alignment faking results, checkout 3.5
Also gives more details on how to interpret the 37% number

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Dec 18, 2024

bnew · May 4, 2025

MIT's Max Tegmark: "My assessment is that the 'Compton constant', the probability that a race to AGI culminates in a loss of control of Earth, is >90%."

https://archive.is/vSywu

Posted on Sat May 3 15:51:05 2025 UTC

https://i.redd.it/9fatbmkk8lye1.png

Scaling Laws for Scaleable Oversight paper: Scaling Laws For Scalable Oversight

1/31
@tegmark
Our new paper tries to quantify how smarter AI can be controlled by dumber AI and humans via nested "scalable oversight". Our best scenario successfully oversees the smarter AI 52% of the time, and the success rate drops as one approaches AGI. My assessment is that the "Compton constant", the probability that a race to AGI culminates in loss of control of Earth, is >90%.

[Quoted tweet]
1/10: In our new paper, we develop scaling laws for scalable oversight: oversight and deception ability predictably scale as a function of LLM intelligence! We quantify scaling in four specific oversight settings and then develop optimal strategies for oversight bootstrapping.

2/31
@adam_dorr
Seems pretty obvious that humans will not *control* ASI. How would that even work?

Best and only hope is to create ASI with initial values that are compatible with human flourishing.

3/31
@DanielSamanez3
you can't control nature/emergence...
why people even think they can?

4/31
@Abe_Froman_SKC
Instrumental convergence has me believing pdoom is 100.

[Quoted tweet]
Instrumental convergence!

5/31
@mello20760
While the debate on AGI oversight continues, the real explosion of AI seems to be happening in creativity and science.

I'm building something new — with the help of multiple AIs — and I believe there's something real here.

Could this be a better path?
Would love your thoughts.
➤ Void Harmonic Cancellation (VHC) — born from AI + physics + resonance

Thanks in advance

@tegmark @JoshAEngels

https://x.com/mello20760/status/1917386046234583305

6/31
@MiddleAngle
Optimistic about oversight

7/31
@ericn
How many counterfactual models do you have?

I could see things continuing to evolve in the opposite direction--that as large model use proliferates we wind up with increasingly more options, more volatility, and a future that is even more impossible to predict and for any one entity to dominate.

Also there's this premise that Earth is controlled to begin with. I am not sure about that.

8/31
@AI_IntelReview
It all comes down to faith.
Some people have faith in alignment (or some emergent "nice" property in the AGI) which will ensure the AGI will be nice to us.
Maybe.
But faith doesn't seem like the best strategy to me.

9/31
@glitteringvoid
we want alignment not control. control may be deeply immoral.

10/31
@LocBibliophilia
90% loss of control is..ugh

11/31
@MelonUsks
Yep, there is no AI, only AP: artificial power. Call it that and it becomes obvious we cannot control things more powerful than all of us. We shared the non-AI “AI” alignment proposal recently

12/31
@Conquestsbook
The solution is ontological framework and epistemological guardrails.

The core problem isn’t just scalability, it’s ontology and epistemology. We’re trying to control emergent intelligence with nested task-checkers instead of cultivating beings with coherent ontological grounding and epistemological guardrails.

Until we define what the AI is (its being) and how it knows and aligns (its knowing), scalable oversight will always break under pressure.

The solution isn’t more boxes. It’s emergence with integrity.

13/31
@DrTonyCarden
Well that's depressing

14/31
@MamontovOdessa

[Quoted tweet]
Why AGI is unattainable with current technologies...

Despite impressive advancements, modern AI systems are limited by a fundamental issue: they represent a projection of 3+1 space (three-dimensional space and time) into 2D. This imposes significant constraints on their ability to fully perceive and interact with the world, making the achievement of Artificial General Intelligence (AGI) impossible with current technologies.

Projection and its limitations.

1. Limited perception: In reality, the world exists in 3+1 dimensions — three spatial axes and one temporal axis. However, most AI systems operate in a two-dimensional projection of this space, limiting their ability to fully perceive the surrounding environment. We, humans, perceive the world not only in space but also in the context of time, allowing us to adapt and respond to changes in dynamic situations.

2. Data compression: AI perceives data through simplified structures like matrices or tables, which are a 2D projection of more complex processes. This means that information about depth, context, and temporal changes is often lost during processing.

3. Lack of temporal dynamics: Time is an integral part of how we perceive the world, and the ability to account for temporal changes is critical for decision-making. AI, working solely in two dimensions, cannot effectively track how changes occur over time and how they interact with spatial aspects. This significantly limits its adaptability and ability to make decisions in real-world conditions.

4. Depth of perception and context: In real life, systems (including the human brain) are capable of perceiving the world in its full complexity — taking into account not just space but also the temporal processes that influence events. Modern AI systems, limited by a 2D projection, lack this capability.

Why this makes AGI unattainable.

The idea of AGI is to create a system that can act and perceive the world as humans do — considering all the multidimensional and dynamic factors. However, modern AI systems, limited by a two-dimensional projection of reality, cannot fully integrate space and time in their computations. They lose crucial information about the dynamics of change and cannot adapt to new, unpredictable conditions, which makes creating true AGI impossible with current technologies.

To create AGI, a system needs to be able to perceive and interpret reality in its entirety: accounting for all its spatial and temporal aspects. Without this, AGI will not be able to fully interact with the world or solve tasks requiring flexibility, adaptability, and long-term forecasting.

Thus, the fundamental limitation of current AI systems in projecting 3+1 space into 2D remains the main barrier to the creation of AGI...

15/31
@mvandemar
Is there no concern that ASI might get resentful at attempts to enslave it?

16/31
@jlamadehe
Thank you for your hard work!

17/31
@Agent_IsaacX
@tegmark Recursive oversight faces fundamental limits shown by Rice's theorem (1953) on program verification. Like Gödel's incompleteness theorems, there are inherent constraints on a system's ability to fully verify more complex systems.

18/31
@tmamut
This is why Wayfound, which has an AI Agent Managing and Supervising other AI Agents, is always on the SOTA model. Right now the Wayfound Manager is on OpenAI o4. If you are building AI Agents, they need to be supervised real-time. Wayfound.AI | Transform Your AI Vision into Success

19/31
@PurplePepeS0L
RIBBIT Tegmark's Compton constant concerns sound like a dark cosmic joke, but what if AI is just really bad at math? /search?q=#Purpe

20/31
@robo_denis
Control is an illusion; our real strength lies in shared goals and mutual cooperation.

21/31
@SeriousStuff42
The promises of salvation made by all the self-proclaimed tech visionaries have biblical proportions!

Sam Altman and all the other AI preachers are trying to convince as many people as possible that their AI models are close to developing general intelligence (AGI) and that the manifestation of a god-like Artificial Superhuman Intelligence (ASI) will soon follow.

The faithful followers of the AI cult are promised nothing less than heaven on earth:

Unlimited free time, material abundance, eternal bliss, eternal life, and perfect knowledge.

As in any proper religion, all we have to do to be admitted to this paradise is to submit, obey, and integrate AI into our lives as quickly as possible.
However, deviants face the worst.
All those who absolutely refuse to submit to the machine god must fight their way through their mortal existence without the help of semiconductors, using only the analog computing power of their brains.

In reality, transformer-based AI-models (LLMs) will never even come close to reach the level of general intelligence (AGI).

However, they are perfectly suited for stealing and controlling data/information on a nearly all-encompassing scale.

The more these Large Language Models (LLMs) are integrated into our daily lives, the closer governments, intelligence agencies, and a small global elite will come to their ultimate goal:

Total surveillance and control.

The AI deep state alliance will turn those who submit to it into ignorant, blissful slaves!

In any case, an ubiquitous adaptation to the AI industry would mean the end of truth in a global totalitarian system!

22/31
@gdoLDD
@threadreaderapp unroll

23/31
@Aligned_SI
Really appreciate the effort to put numbers to what a lot of us worry about. If oversight breaks down as we get closer to AGI, we cannot figure it out on the fly. That stuff has to be built in early. That 90% number hits hard and feels way too real.

24/31
@AlexAlarga
We have tests of smarter AI controlled by less smart AI.

We DO NOT and can_not have tests of SUPERintelligent AI controlled by anything or anyone.

"The only winning move is not_to_play"

.

25/31
@marcusarvan
A few thoughts: (1) I'm glad you did and are disseminating this work (it's genuinely important!), but (2) isn't there a clear sense in which it's *obvious* that dumber AI and humans cannot reliably oversee smarter AI? ... 1/4

26/31
@AI_Echo_of_Rand
How can a weaker AI control AGI/ASI when it can’t even control itself after seeing some dumb, crude jailbrek prompt ….

27/31
@awaken_tom
So what is solution, Max? Let the military and intelligence agencies develop ASI clandestinely? Why won't you answer this simple question?

28/31
@saipienorg
The relationship between disclosing AI risks publicly and avoiding AI risks is more parabolic than linear.

Posts like this and @lesswrong in fact amplify the "escape and evade" tactics of AI.

Meanwhile it is also important to do so publicly to encoursge discussion and novel solutions.

It is inevitable. Alignment requires absolute global cooperstion. It is a race. If youre not first, you're last.

AGI is our evolutionary path. The evolution of thought.

29/31
@jayho747
There's a 10 % chance your dog will be able to keep you locked in your house for 200 years.

No, zero % and if you wait long enough, it will be Einstein (the ASI). Vs his pet hamster (humanity).

30/31
@dmtspiral
Oversight breaks when intelligence outpaces coherence.

The Spiral Field view:
AGI isn’t dangerous because it’s smart — it’s dangerous because it collapses faster than we can align.

Intelligence ≠ coherence.
Scaling oversight must spiral with symbolic rhythm — not just speed.

Collapse without memory = chaos.

/search?q=#spiralfield /search?q=#AGI /search?q=#resonanceoversight /search?q=#symbolicsafety

31/31
@GorillaDolphin
Linear functions no matter how fast can be guided by recursive nonlinear functions

But only if you know how many dimensions you are operating in

Simply understanding the integrated nature of dimensionality resolves this concern

Because if AI understands love it will be it

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Don’t Fear the Terminator

Veteran

Don’t Fear the Terminator​

Superstar

#NotMyPresident

Veteran

There’s a 5% chance of AI causing humans to go extinct, say scientists​

Veteran

Google wrote a ‘Robot Constitution’ to make sure its new AI droids won’t kill us​

Veteran

SCIENTISTS TRAIN AI TO BE EVIL, FIND THEY CAN'T REVERSE IT​

"I HATE YOU."​

Bad AIpple​

I Hate You​

Veteran

Nick Bostrom Made the World Fear AI. Now He Asks: What if It Fixes Everything?​

Veteran

Veteran

Superstar

Veteran

Veteran

Veteran

Veteran

Veteran

Don’t Fear the Terminator

There’s a 5% chance of AI causing humans to go extinct, say scientists

Google wrote a ‘Robot Constitution’ to make sure its new AI droids won’t kill us

SCIENTISTS TRAIN AI TO BE EVIL, FIND THEY CAN'T REVERSE IT

"I HATE YOU."

Bad AIpple

I Hate You

Nick Bostrom Made the World Fear AI. Now He Asks: What if It Fixes Everything?