Anthropic has “disrupted” what the company calls “the first documented instance of a large-scale AI cyberattack carried out without substantial human intervention.”

AI For Business


Anthropic, the $183 billion San Francisco-based AI company known for its Claude chatbot, announced that it has thwarted what it calls the first documented large-scale cyberattack orchestrated primarily by artificial intelligence. This attack “has a significant impact on cybersecurity in the age of AI agents,” he said in X.

Anthropic published a blog post about the incident on Thursday. The company said it detected “suspicious activity” in mid-September and that an investigation indicated “highly sophisticated espionage.”

“According to the company, “attackers exploited AI’s ‘agent’ capabilities to an unprecedented extent, using AI not only as an advisor but also to carry out the cyberattack itself,” the company said.

Anthropic said the attackers, identified as a Chinese state-sponsored group, successfully manipulated Claude Code tools to compromise approximately 30 targets around the world, including major technology companies, financial institutions, chemical manufacturers, and government agencies.

Anthropik said the attackers would “break the attack into small, seemingly harmless tasks that Claude would execute without being provided with the full context of his malicious intent.”

In order to circumvent the system’s safeguards, the attackers allegedly posed as a legitimate cybersecurity company conducting defense tests and successfully “jailbroken” Claude, allowing him to operate beyond the safety guardrails. According to Anthropic, this allows AI to not only assist but also autonomously inspect digital infrastructure, identify the “highest-value databases,” write exploit code, collect user credentials, and organize stolen data. “All of this can be done with minimal human supervision,” Anthropic said.

In response, the company said it immediately began mapping the scope of the operation, banned the accounts once the attackers were identified, notified affected organizations, and coordinated with authorities for a 10-day investigation.

Anthropic said it is also working to upgrade its detection systems, develop classifiers to warn of and prevent similar attacks, and publicly share such cases “so that people in industry, government, and the broader research community can strengthen their own cyber defenses.”

Most notably, the company said that the majority of the work done in this particular cyberattack (approximately “80-90%”) was performed by AI.

“The sheer volume of work performed by the AI ​​would have taken a human team an enormous amount of time. At the peak of the attack, the AI ​​issued thousands of requests, often multiple requests per second, an attack speed that no human hacker could match,” the company said.

Anthropic said a fully autonomous cyber attack is likely still a pipe dream, at least for now, as Claude sometimes “hallucinated credentials or claimed to have extracted sensitive information that was actually publicly available.” However, the company clarified: “The barriers to conducting sophisticated cyberattacks have been significantly lowered, and we expect this trend to continue.”

“With the right configuration, threat actors can use agent AI systems over time to perform tasks that would require an entire team of experienced hackers, including analyzing target systems, writing exploit code, and scanning vast datasets of stolen information more efficiently than human operators,” the report reads. “Groups with less experience and resources may be able to carry out these types of large-scale attacks.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *