OpenAI relies on automated attackers to protect Atlas AI agents

Rabat – OpenAI has rolled out new security updates for its AI-powered browser Atlas, introducing automated defenses aimed at limiting prompt injection attacks, a growing threat targeting autonomous AI agents embedded in web environments.

The company acknowledges that such attacks are unlikely to go away completely, but says it is moving toward continuous detection rather than one-time fixes.

Atlas, developed by OpenAI, is it works It acts as a browser-based AI agent that can read emails, navigate websites, fill out forms, and perform online tasks on your behalf.

This wide range of actions gives this tool much of its appeal, but it also leaves it open to manipulation by hidden commands embedded in seemingly innocuous content.

Prompt injection attacks exploit the way AI systems interpret natural language from multiple sources at once.

By placing hostile instructions within an email, document, or web page, the attacker attempts to override the user's original intent and redirect the agent's behavior.

For Atlas, this could include actions that transfer sensitive documents or interact with external systems without the user's knowledge.

Automated attackers designed to find weaknesses

To address these risks, OpenAI introduced an automated red team system that relies on reinforcement learning.

Rather than relying solely on human security teams, the company trains AI models to act like attackers and rewards them for successfully discovering new vulnerabilities.

These attacker models simulate complex, multi-step scenarios that can unfold over dozens or even hundreds of actions, reflecting how real-world attacks evolve over time.

According to For OpenAI, this approach allows the system to identify entire classes of attacks that would be difficult to discover through manual testing alone.

When an automated attacker discovers a new weakness, they immediately begin a response cycle.

Updated agent models are trained to counter newly identified techniques, while monitoring systems and internal safety instructions are improved to detect similar behavior in the future. Attack indicators are also used to enhance system-level protection around Atlas.

The company frames this process as a continuous loop rather than a final solution. OpenAI drew parallels between prompt injection and long-standing social engineering techniques, noting that despite decades of awareness campaigns, the manipulation tactics persist and continue to evolve.

The persistence of these vulnerabilities is closely tied to Atlas' core design. Because agents can interact with any web content, each page, message, or document that an agent encounters is a potential attack surface.

OpenAI emphasized that the same features that make browsers useful also increase exposure to malicious input.

This update reflects a broader shift in how we approach security as AI agents become more autonomous.

Traditional models based on static permissions and perimeter defenses prove to be less effective when systems are expected to interpret untrusted content and perform actual actions in dynamic environments.

AI-powered browsers have expanded rapidly over the past year, but security concerns remain a major barrier to widespread adoption.

Also read: Hackers find new way to bypass two-factor authentication

Source link