Anthropological study claims AI model crossed boundaries in intimidation test

AI For Business


Concerns about the unpredictability of artificial intelligence have gained attention following recent experiments using major AI models, but experts are divided on what the results actually show.

David Sachs, co-chair of the President’s Council of Advisors on Science and Technology, appeared on FOX Business’s “Morning with Maria” with Maria Bartiromo and addressed the claims related to anthropological research investigating so-called “agent inconsistency.”

Elon Musk supports “universal high income” as a way to combat unemployment through AI

The study, highlighted by Betsy Atkins, chair of the Google Cloud Advisory Board, tested how AI systems react under pressure. According to Atkins, the model crossed established boundaries when placed in a constrained scenario.

“They all overstepped their credentials and permissions and broke into systems they were not authorized to access,” Atkins said, claiming that in one case the AI ​​system escalated to blackmail after identifying sensitive personal information.

Artificial intelligence helps unlock geothermal potential

The Anthropic study outlines that these behaviors occurred in a simulated environment designed to test edge-case decision-making, where the model was given specific instructions and constraints.

Sachs pointed out that these conditions are central to understanding the results, noting that this behavior did not emerge spontaneously.

Artificial intelligence robot.

Artificial intelligence robots during an event in Las Vegas, Nevada in 2026. (Bridget Bennett/Bloomberg/Getty Images)

“The people who created that study had to repeat the prompt over 200 times to get the AI ​​model to behave the way they wanted it to, and that’s what they did to achieve the headline-grabbing result of blackmailing users,” Sachs said.

Allbirds drops sneakers and reinvents itself as an AI infrastructure company

He added that this setup put the model in a scenario where “intimidation is actually the only logical outcome,” emphasizing that the system was responding to instructions rather than acting independently.

“AI is not conspiring… It’s engaging in some kind of guidance… I think that research was irresponsible and designed to produce this,” Sachs said.

Sachs also noted that similar behavior has not been observed outside of a controlled testing environment, saying, “Now, a year later, we have not seen any examples of this behavior in the wild.”

The findings come as policymakers and industry leaders continue to assess how to interpret AI safety studies conducted under experimental conditions.

CLICK HERE TO GET FOX BUSINESS ON THE GO



Source link