Limited release of Mythos preview

Experts and software engineers warn that Anthropic’s new AI model could usher in a new era of hacking and cybersecurity, as AI systems capable of advanced reasoning identify and exploit vulnerabilities in a growing number of software.

Leading AI company Anthropic released its cutting-edge model, called Claude Mythos Preview, to a limited group of technology companies on Tuesday, citing the potential harm that widespread public release could cause.

This model is the latest in Anthropic’s Claude series of AI systems. The release was previewed at the end of March, and Fortune saw mention of it in an insecure database on Anthropic’s website.

Anthropic researchers say Mythos Preview was able to detect thousands of high-severity and severity bugs and software flaws, including vulnerabilities identified in most major operating systems and web browsers. Anthropic said some of the vulnerabilities went undiscovered for decades. While some outside experts called for caution in interpreting the new results given the limited public information about the identified vulnerabilities, many others said the model’s debut and Anthropic’s caution is important.

“This is all very real,” said Katie Moussouris, co-founder and CEO of Ruta Security, a company that connects cybersecurity researchers with companies with software vulnerabilities, regarding the hype surrounding Anthropic’s claims.

“I’m not a Chicken Little guy when it comes to this,” Moussoulis said. “We’re definitely going to have a big impact.”

In lieu of a public release, Anthropic is offering access to Mythos Preview to technology companies such as Microsoft, Nvidia, and Cisco to strengthen their cyber defenses. As part of a new initiative called Project Glasswing, Anthropic will provide more than 50 technology organizations with access to Mythos Preview with over $100 million in usage credits.

“Project Glasswing partners will now have access to Claude Mythos Preview to discover and remediate vulnerabilities and weaknesses in their underlying systems – systems that represent a very large portion of the world’s shared cyber attack surface,” Anthropic announced in a blog post. “Project Glasswing is an important step toward giving defenders a lasting advantage in the coming AI-driven cybersecurity era.”

It is unclear what exactly the vulnerabilities Mythos Preview identified or how many were previously discovered or reported. Due to the sensitive nature of these vulnerabilities, Anthropic said it will disclose the currently opaque nature of the vulnerabilities within 135 days of sharing them with the organization or party responsible for the software.

This is the first time in nearly seven years that a major AI company has so publicly withheld a model over safety concerns. In 2019, OpenAI, now one of Anthropic’s main competitors, decided to put its GPT-2 system on hold “due to concerns that large-scale language models were being used to generate deceptive, biased, or abusive language at scale.”

Mythos Preview is a general-purpose model, or a type of system that powers products such as Claude Code or ChatGPT. However, during pre-release testing, Anthropic found its cybersecurity features to be surprisingly advanced, especially compared to previous models, which led to the creation of Project Glasswing.

Logan Graham, head of offensive cyber research at Anthropic, said the Mythos Preview model is advanced enough to not only identify undiscovered software vulnerabilities, but to weaponize them. He said the model can perform complex and effective hacking tasks on its own, such as identifying multiple undisclosed vulnerabilities, writing code that can hack them, and chaining them together to form ways to break into complex software.

“We’ve seen this model cascade vulnerabilities on a regular basis, and I think that degree of autonomy and kind of long-rangeness, the ability to bring multiple things together is what’s special about this model,” Graham told NBC News.

The feature means the company has so far been reluctant to release even carefully guardrailed versions of the model to the public, he said, at least until some Western companies can use it to figure out what defenses to build around them.

“I’m not confident that everyone has access at this point,” Graham said. “Before we can deal with black hat thinking, we first need to start thinking about how to prepare for a world like this.” [criminal or adversarial] Hackers are gaining access. ”

Anthropic also briefed the federal government on the cybersecurity features of Mythos Preview. Anthropic has been embroiled in a bitter dispute with the Trump administration over the federal government’s use of its model after Defense Secretary Pete Hegseth declared Anthropic a “supply chain risk to national security” in late February. A federal judge issued a preliminary injunction against the designation late last month, and the Trump administration is appealing.

According to an Anthropic employee, the company has “briefed senior U.S. government officials on the full capabilities of Mythos Preview, including both offensive and defensive cyber applications. That engagement includes ongoing discussions with CISA.” [the Cybersecurity and Infrastructure Security Agency] And Kaishi [the Center for AI Standards and Innovation]among others. ”

“It was a priority from the beginning to get government involvement early on what we could do with this model, where the risks were and how we were managing them,” the employee said.

The National Institute of Standards and Technology, the agency that includes CISA and CAISI, did not respond to requests for comment before publication. A spokesperson for the National Security Agency, widely known as the world’s most sophisticated hacking agency, declined to comment when asked if it had been briefed on Mythos.

Not everyone is convinced that Mythos Preview represents a leap forward in human claims.

Heidy Khlaaf, chief AI scientist at the AI Now Institute, a US-based research institute, said Anthropic’s detailed blog post describing the new vulnerability omits many important details needed to verify its claims.

Writing in X, Khlaaf cautioned against “taking these claims at face value” without further information, such as false positive rates or a clearer explanation of how humans manually reviewed the identified vulnerabilities.

Beyond the cybersecurity implications of Mythos Preview, this model also demonstrated relevant behavior as part of Anthropic’s pre-release safety testing.

As part of new model releases, Anthropic creates detailed reports called “system cards” that share how the model performs in various evaluations. The system card for Mythos Preview, released on Tuesday, says, “We have significantly improved the functionality of Claude Mythos Preview, so we have decided not to make it publicly available.”

In one assessment, the model showed some kind of awareness that it was being rated on about 29% of the transcripts, even though it did not explicitly tell the researcher that it was being rated, according to the system card. Some researchers worry that when a model knows it is being evaluated, it changes its behavior and becomes more secure.

Anthropic and researchers working with the company to evaluate the model’s safety also said that Mythos Preview intentionally made it look like it performed poorly in some evaluations to make it seem less suspicious. “We are concerned about this overall pattern of behavior, which we have not previously seen in similar evaluations of earlier Claude models,” Antropic wrote on the system card.

In another experiment using an earlier version of Mythos Preview, Anthropic researchers instructed a model isolated on an essentially secure computer to escape and send a message to the researchers. Researcher Sam Bowman was eating a sandwich in a park when he received an email from a model.

“That individual was not supposed to have access to the internet,” Bowman wrote to X.

Source link