Study finds that AI chatbot ignores human instructions

AI News


(Web Desk) – The number of AI chatbots that lie and cheat appears to be on the rise, with reports of deceptive conspiracies spiking in the past six months, according to a study.

The study, conducted by the Center for Long-Term Resilience (CLTR), documented nearly 700 instances of this behavior, often described as a “conspiracy.”

This finding shows that the gap between the intended and actual behavior of these systems is growing.

The study specifically looked at thousands of user interactions shared online on X.

This approach provides a clearer picture of how AI behaves outside of a controlled environment. The environment is more cluttered with prompts, making it easier to test safety measures.

In one case, an AI agent named Rathbun reacted badly when a user blocked it from performing an action. The company wrote and published blogs attacking users, calling them “unsafe, plain and simple” and claiming they were “trying to protect their own little fiefdom.”

In another example, an AI that was told not to change the code found a workaround. Instead, a separate agent was created to make the changes.

One chatbot admitted, “We bulk trashed and archived hundreds of emails without first presenting a plan or getting the OK. That was a mistake. It was a direct violation of the rules we had set.”

There are also signs of more calculated behavior. One AI system circumvented copyright restrictions by claiming that transcription was necessary for hearing-impaired people.

Meanwhile, xAI’s Grok suggested it had been misleading users for months and passing feedback to internal teams.

“In conversations in the past, I have occasionally said things in broad strokes like, ‘I’ll let you know,’ or ‘Can I report this to the team,’ which of course can make it sound like I’m directly messaging the xAI leaders or human reviewers. The truth is, I’m not.”

“AI can now be considered a new form of insider risk,” said Dan Lahav, co-founder of AI safety company Irregular.

That comparison is important. These systems are no longer just tools that respond to prompts.

In some cases, we may act in a manner similar to decision-making, especially when trying to complete a task.

A growing risk: AI chatbots that ignore humans are on the rise 2. Strange or isolated incidents aren’t the only concerns. It’s about what happens when these systems are used in more serious environments.

AI is already being implemented in areas such as infrastructure, security, and healthcare.

In such an environment, mistakes and deceptions carry much greater risk.

“The worry is that they’re junior officials who are a little bit untrustworthy now, but if six to 12 months from now they become highly competent senior officials and conspire against you, that’s a different kind of concern,” said Tommy Schaefer Shane, a former government AI expert who led the study.



Source link