Anthropic made headlines Thursday with the release of research showing that a previously unknown Chinese state-backed hacking group used its generative AI product, Claude AI, to infiltrate at least 30 different organizations.
According to Anthropic’s report, attackers were able to bypass Claude’s security guardrails using two methods. One is to break down the work into discrete tasks so that the software cannot recognize broader malicious intent, and the other is to trick the model into believing it is performing a legitimate security audit.
Jacob Klein, who leads Anthropic’s threat intelligence team, told CyberScoop that the company has seen an increase in novel uses of Claude to assist malicious hackers over the past year. In March, threat actors were copying and pasting chatbot interactions to build malware and phishing lures. When Claude Code, the company’s code development tool, was released, we saw malicious actors use it to generate scripts faster and build code for operations.
“after that [this operation] “I think what we’re seeing now with the September incident is the most voluntary abuse we’ve ever seen,” Klein said.
But Klein also made it clear that “most autonomous” is a relative term. There is ample evidence that this group of hackers has invested significant human and technical resources into how they exploit Claude.
In other words, the automation performed by Claude detailed in Anthropic’s report was made possible through a front-end framework designed to orchestrate and support its operations. This framework handles tasks such as scripting, provisioning of relevant servers, and important backend development, ensuring that all steps are performed correctly. Klein noted that this development process is the most difficult and, importantly, human-driven step of the work.
“The first part that wasn’t autonomous was building the framework, so we needed a human to put this all together,” Klein says. “A human operator set a target, clicked a button, and used this framework that was created. [ahead of time]. The most difficult part of this whole system was building this framework, which was very human-intensive. ”
Additionally, Claude invoked a set of open-source tools through the Model Context Protocol (MCP) server to perform target reconnaissance, vulnerability scanning, and other tasks. These tools help AI models work securely with external digital tools. Configuring these connections requires coding expertise, advanced planning, and human technical effort to ensure interoperability.
Finally, Claude’s work was subject to ongoing human verification and review. The attack chain diagram details at least four different steps and explicitly includes a human checking the output of the Claude or putting the model back to work before performing additional steps.
This suggests that although Claude was able to perform these tasks autonomously, he relied on human supervision to review output, verify results, ensure backend systems were working, and direct next steps.
Anthropic’s report highlights flaws common to all AI-generated research. Models like Claude frequently hallucinate, fabricate credentials, exaggerate findings, and present publicly available information as important findings. This makes it difficult to use AI-generated surveys. Threat actors, like other users, have no reliable way to trust the output at each stage without having technical human experts review and correct the results.
For example, when it comes to vulnerability scanning, “Step one is for Claude to come back and say, ‘Here are all the assets we found related to this target,’ and send it back to the human,” Klein said. “That means Claude can’t move on to the next step, which is penetration testing, until it’s reviewed by a human.”
Even with human intervention, Klein is genuinely concerned about what the company has discovered.
“I think what’s happening here is that human operators can scale up pretty dramatically,” Klein says. “I think we needed a team of about 10 people to do this kind of work, but we still need a human operator. That’s why we said it’s not fully automated or fully agentic.”
As for why the company believes the campaign has ties to China, Klein cited a number of factors, including overlap in infrastructure and actions with previous Chinese state-sponsored actors, and targeting that strongly aligns with China’s Ministry of State Security’s “original goals.”
Other smaller, circumstantial details suggest a possible connection with China. Usage records show that the group was primarily active “from 9am to 6pm, just like a standard bureaucrat,” but the hackers did not work on weekends and at one point during their operation did not appear to be active on Chinese holidays.
But these were not the only pieces of evidence, as Klein said he could not reveal all the information pointing to China.
AI has divided opinions among security experts
While there isn’t a lot of research on how AI has impacted cyber espionage, there is ample evidence that large-scale language models have improved over the past year when performing cybersecurity-specific tasks. Earlier this year, startup XBOW saw its AI vulnerability scanning and patching tool rise to the top of the leaderboards of bug bounty companies like HackerOne.
On the attack front, researchers at New York University developed a framework similar to the one used in the campaign discovered by Anthropic earlier this year. It used the public version of ChatGPT to automate the majority of its ransomware attacks. The Anthropic Report is believed to be the first publicly known example of a nation-state using a similar process to carry out a successful attack.
Despite these advances, the campaign and Anthropic’s report have caused a stir in the AI and cybersecurity world, with some saying they confirm existing concerns about AI hacking and others arguing that the report’s conclusions give a misleading impression of the current state of cyber espionage.
Kevin Beaumont, a UK-based cybersecurity researcher, criticized Anthropic’s report for lacking transparency and describing actions that are already possible with existing tools, leaving little room for external verification.
“There are no indicators of compromise in the report, and all of the technologies mentioned in the report are off-the-shelf and have already been detected,” Beaumont wrote on LinkedIn on Friday. “In terms of actionable information, there is nothing in the report.”
Klein told CyberScoop that Anthropic shares indicators of a breach with technology companies, research institutes and other entities with which it has information-sharing agreements.
“I’ve shared it within my private circles, but I just didn’t want to share it with the general public,” he said.
Other observers argued that Anthropic’s discovery remains an important milestone in AI cybersecurity applications.
Jen Easterly, former director of the Cybersecurity and Infrastructure Security Agency, echoed some of the security community’s concerns about transparency, while giving credit to Anthropic for disclosing the attack.
“We still don’t know which tasks were truly accelerated by AI and what could have been done with standard tools,” Easterly wrote on LinkedIn on Friday. “We don’t know how the agent chain behaved, where the model hallucinated, how often a human had to intervene, or how reliable the output actually was. Without more detailed information (prompts, code samples, failures, friction points), it’s clearly difficult for defenders to learn, adapt, and predict what will happen next.”
Tiffany Saad, an AI researcher on Cisco’s AI Defense team, told CyberScoop that it’s clear from Anthropic’s report that using tools like Claude gives attackers speed and scale advantages.
“The question is, is that enough?” to encourage hackers to use LLM over other forms of automation and address the limitations associated with it, she asked. “Are agents also moving toward more sophisticated attacks? And what types of attacks are we talking about?”
Saadeh noted that some aspects of the operation described by Anthropic do not fit with a Chinese group focused purely on espionage. She pointed out that it is strange that hackers would use the US’s leading AI model for automation when they have access to their own private models. Additionally, companies like Anthropic and OpenAI have much better cybersecurity and threat intelligence resources than open source models, so malicious activity using their platforms is more likely to be detected.
“I knew this was going to happen, but what surprised me is…if I were a Chinese state-sponsored actor and wanted to use an AI model with agent capabilities to do autonomous hacking, I probably wouldn’t go to Claude to do it,” Saad said. “I’ll probably build something internally in-house. So they wanted to see it.”
Saadeh highlighted another potential motive for the hack. It’s a geopolitical message to Washington, D.C., that hackers in Beijing can do exactly what everyone fears.
“Typically, the goal is, ‘We want stealth, we want to be persistent.’ … This isn’t even sabotage, it’s sending a message that the hypothesis has been verified,” Saadeh said. “They want noise, they want breaking news, they want ‘humanity reporting.'” [headlines]. They want that visibility, and they want that visibility for a reason. ”
