The LED lights up in the server rack in the data center.
Photo Alliance | Photo Alliance | Getty Images
The alarm bell went out last month in the AI community when it was reported that Claude of Mankind relied on fearful mail and other self-preservation techniques to avoid shutdowns.
Human researchers say it is part of making malfunctioning models (industry terminology “inconsistency”) safer. Still, the Claude episode raises the question: is there a way to turn off AI once it exceeds a threshold that is smarter than humans or beyond what is known as a close emergency?
AI has the ability to create complex conversations with vast data centers, and is already beyond the point of physical failure or “kill switch.”
According to a man considered the “Godfather of AI,” the more important power is the power of persuasion. Once technology reaches a certain point, we need to persuade AI that it is about protecting humanity.
“If it gets smarter than us, it'll be much better than the one who convinces us. If we're not in control, all we have to do is persuade.”
“Trump didn't break into the Capitol, but he convinced people to do it,” Hinton said. “At some point, the problem is less about finding a kill switch and more about persuasiveness.”
Hinton said persuasion is an increasingly skilled skill that AI uses, and that humanity may not be prepared to respond. “We're used to being the most intelligent things around,” he said.
Hinton explains a scenario where a human is equivalent to a three-year-old in a nursery, with a big switch on. The other 3 year old will tell you to turn it off, but then an adult will come and tell you that if you turn it on, you will never need to eat broccoli again.
“We have to face the fact that AI will become smarter than us,” he said. “Our only hope is that we don't want them to hurt us. If they want to do us, we must be merciful to them.
There are some similarities about how states came together to manage nuclear weapons that can be applied to AI, but they are not perfect. “Nuclear weapons are just good at destroying things. But AI isn't like that, it can be a bad force as well as a bad force,” Hinton said. The ability to analyze data in areas such as healthcare and education can be extremely beneficial. This should raise the emphasis among world leaders on collaboration to ensure AI is benevolent and implement safeguards, he says.
“I don't know if that's possible, but it would be sad if humanity was extinct,” Hinton said. He believes there is a 10% to 20% chance that the AI will take over if humans can't find a way to make it merciless.
University of Toronto AI godfather Jeffrey Hinton is on the second day of 2023 at the Enacea Centre in Toronto, Canada.
Ramsey Cardi | SportsFile | Getty Images
Experts say other AI protection measures can be implemented, but AI will start training them. In other words, all implemented safety measures shift the control dynamics to become training data for avoidance.
“The very act of building in the shutdown mechanism teaches these systems how to resist them,” says Dev Nag, founder of the agent AI platform QueryPal. In this sense, AI acts like a virus that mutates against a vaccine. “It's like a fast forward evolution,” Nag said. “We no longer manage passive tools. We are negotiating with entities that model their attempts to control them and adapt accordingly.”
There are more extreme measures proposed to stop AI in emergencies. For example, electromagnetic pulse (EMP) attacks. Electromagnetic radiation Damage electronics and power supplies. The idea of data center bombing and logging power grids is also argued as technically possible, but it is now a practical and political paradox.
For one, coordinated data center destruction requires a simultaneous strike in dozens of countries, one of which could be gained by rejecting a large-scale strategic advantage.
“Blowing up a data center is great science fiction. But in the real world, the most dangerous AIS is not in one place. They are everywhere and are not sewn into the structure of business, political, and social systems.
How to ruin humanity when you try to stop AI
The humanitarian crisis underlying the urgent attempt to stop AI can be immeasurable.
“The Continental EMP explosion will actually shut down AI systems, along with all hospital ventilators, water treatment plants and refrigerated drug supplies in their range,” Nag said. “If we could somehow adjust globally to shut down all our power grids tomorrow, we would face an immediate humanitarian catastrophe. We had no food refrigeration, medical devices, or communication systems.”
Distributed systems with redundancy are not built solely to resist natural obstacles. They also resist essentially intentional shutdowns. All backup systems built for reliability become a vector of persistence from tight AI that relies heavily on the same infrastructure we survive. Modern AI is traversing thousands of servers across the continent using an automated failover system that treats shutdown attempts as routing damage.
“The Internet was originally designed to survive nuclear war, and that same architecture means that tight systems can last unless they are willing to destroy the infrastructure of civilization,” Nag added, adding that “an extreme measure that causes immediate and visible human suffering than what we are trying to prevent.”

Human researchers are cautiously optimistic that inducing the work they are doing today (the scenario designed to do so in the scenario that elicited a terrifying mail in Claude) will help prevent AI acquisitions tomorrow.
“It's hard to predict that we'll get to that place, but doing stress testing along what we're pursuing is important to see how they play and use it as guardrails,” said Kevin Troy, a researcher in humanity.
Human researcher Benjamin Wright says the goal is to avoid points that agents control without human supervision. “If you get to that point, we should make sure we don't get to that position because humans are already out of control,” he said.
Trunov says that controlling AI is a governance question more than physical effort. “We need a kill switch not just for the AI itself, but for the business processes, networks and systems that amplify its reach,” Trunov said.
Today, AI models, including Claude and Openai GPT, have the ability to self-preserve like living things.
“What appears to be “interference” is usually a complex set of actions that arise from a severely aligned incentive, unclear instructions, or ungeneralized models. It's not the HAL 9000,” Trunov said, referring to the computer system in “2001,” Stanley Kubrick's classic Sci-fi film. “It's like an overconfident internship with no context and no access to nuclear launch codes,” he added.
Hinton's eyes see the future that he helped him create with caution. He says that if he hadn't tripped over the AI building blocks, someone else would have. And despite all the attempts he and the other prophets have made to game out what happens with AI, there is certainly no way to know.
“No one has any clue. We didn't have to deal with anything smarter than us,” Hinton said.
When asked if he was worried about an AI-infiltrated future that today's elementary school students might have one day look on his face, he replied: “My kids are 34 and 36 years old, and I'm worried about their future.”
