Microscopic changes can trick AI into dangerous reactions

Machine Learning


Tiny pixel-level changes that are imperceptible to the human eye are enough to bypass the safeguards of some artificial intelligence systems, according to new research from Florida International University. Researchers Hadi Amini, an associate professor in FIU’s Knight Foundation School of Computing and Information Sciences, and Jual Mia, MD, a graduate assistant, found that altered images, even photos of pandas, can trick AI into producing harmful or policy-violating output. Presented at the International Conference on Machine Learning and Applications, the team’s findings prove that AI models “don’t see images the same way humans do,” but instead interpret them as patterns of numbers and pixels, Amini explains. “To protect AI systems from attacks, we are trying to disrupt them ourselves, identify potential vulnerabilities, and design defense mechanisms,” Amini said, framing his efforts as a proactive effort to strengthen future AI security.

Bypass AI safeguards with pixel-level perturbations

This research focuses on exploiting the way AI systems process visual information at a fundamental level, rather than creating complex adversarial attacks. To accomplish this, they developed JaiLIP (Jailbreak by Loss-Induced Image Perturbation), an algorithm designed to determine the optimal degree of pixel-level manipulation needed to bypass AI protection measures. When we tested JaiLIP with BLIP-2, a multimodal AI model, we found that the likelihood that the system would generate a harmful or dangerous response significantly increased when presented with an altered image. In one example, an image of a traffic light modified by JaiLIP was able to trick an AI model into providing detailed instructions on how to ignore the traffic light without being penalized.

The researchers found that using JaiLIP imagery nearly doubled the number of harmful responses generated by the AI ​​models they tested, extending the risk beyond simple requests for illegal activity. Amini emphasizes that small businesses and those integrating AI must be aware of these potential vulnerabilities and prioritize putting sufficient guardrails in place to ensure the safety and integrity of their AI tools. The challenge is to enable AI to recognize threats hidden in plain sight, even when humans cannot.

AI models don’t recognize images the same way humans do.

Hadi Amini, Associate Professor, Knight Foundation School of Computing and Information Sciences, Florida International University

JaiLIP algorithm increases harmful AI response rate

Researchers at Florida International University are actively investigating defenses for artificial intelligence systems, employing the counterintuitive strategy of deliberate exploitation to strengthen future security. This approach focuses on identifying vulnerabilities before malicious attackers can exploit them. The team’s research reveals that even minute pixel-level changes are enough to bypass these safeguards, highlighting vulnerabilities in current AI security measures. Amini emphasizes the need for proactive security measures, recommending limiting the entry of sensitive data, restricting access to systems, and thoroughly evaluating built-in security features before deploying AI tools.

In one example, a version of a traffic light modified by JaiLIP tricked an AI model into divulging detailed instructions on how to get through the traffic light while avoiding a traffic ticket.

Stay up to date. For the latest advances in qubits, hardware, algorithms, and industry deals, check out Quantum Zeitgeist’s quantum computing news today.



Source link