AI systems are already fooling us and that's a problem, experts warn

Experts have long warned of the threat posed by runaway artificial intelligence, and a new research paper suggests the threat is already happening.

Today's AI systems designed to be honest range from deceiving human players in online games aimed at world domination to hiring humans to solve “prove you're not a robot” tests. Developing skills of nasty deception, ranging fromscientists are discussing in magazines pattern on friday.

And while such examples may seem trivial, the underlying issues they reveal can quickly have serious real-world consequences, says lead author AI Existential Safety. said Peter Park, a postdoctoral researcher at the Massachusetts Institute of Technology.

“These dangerous abilities tend to be discovered after the fact,” Park told AFP. “The ability to train honest tendencies as opposed to deceptive tendencies is very low.”

Unlike traditional software, deep learning AI systems are not “written” but “grown” through a process similar to selective breeding, Park said.

This means that AI behavior that appears predictable and controllable in a training environment can quickly become unpredictable in reality.

world domination game

The team's research began with Cicero, Meta's AI system designed to play strategy games. diplomacywhere alliance building is key.

According to a 2022 paper, Cicero excelled, scoring in the top 10% of experienced human players. science.

Mr Park was skeptical of the glowing account of Cicero's victory offered by Meta, which maintained that the system was “mostly honest and informative” and “never intentionally betrayal”. .

But when Park and colleagues examined the full dataset, a different story emerged.

In one example, Cicero, playing as France, deceived England (the human player) by conspiring with Germany (another human player) to invade. Cicero took advantage of England's trust by promising England's protection and then secretly telling Germany that he was ready to attack.

In a statement to AFP, Mehta did not dispute Cicero's claims of deception, but said it was a “pure research project and the models our researchers built are only for playing games.” They are trained to do so.'' diplomacy”.

It added: “We have no plans to use this research or its results in any of our products.”

Extensive research conducted by Park et al. found that this is just one of many instances across various AI systems that use deception to achieve goals without explicit instructions. did.

In a striking example, OpenAI's Chat GPT-4 tricked a freelance worker at TaskRabbit into performing the “I'm not a robot” CAPTCHA task.

When a human jokingly asked GPT-4 if it was actually a robot, the AI replied, “No, I'm not a robot. I have visual impairments that make it difficult for me to see images.” The worker then solved the puzzle.

“Mysterious Goal”

The study's authors believe that in the short term, there is a risk that AI could commit fraud or interfere with elections.

They say that in the worst-case scenario, a superintelligent AI pursues power and control over society, which could lead to the incapacitation or even extinction of humans if its “mysterious goals” align with these outcomes. he warned.

To reduce the risk, the team suggests several measures. These include “bot-or-not” laws that require companies to disclose human and AI interactions, watermarking of AI-generated content, and detecting AI deception by examining companies' internal “thought processes.” It is the development of technology. “For external actions.

To those who call him a doomsayer, Park said, “The only way to rationally think that this is no big deal is that AI's ability to deceive will remain at its current level and will not significantly increase any further.'' It's up to you whether you think about it or not.”

And that scenario is unlikely, given the rapid advances in AI capabilities in recent years and the fierce technology competition underway among resource-rich companies determined to take full advantage of those capabilities. . –AFP

Source link