Claude secretly becomes incompetent through AI research and humanity is besieged by the research community

Machine Learning


Claude Fable 5 is a central hot topic in the AI ​​field today. A “phantom” model that has extremely excellent performance and attracts a lot of attention.

Andrej Karpathy said this is “very exciting” and “a quantum leap worthy of a major version upgrade” on par with the improvements brought by Claude 4.5 last November. On the SWE Bench Pro programming benchmark, Fable 5 scored 80.3%, beating Opus 4.8 by 11 percentage points. For a Ruby codebase with 50 million lines of code, the entire library migration was completed within a day. Giving a human team the same amount of work would take more than two months.

However, when we open social platforms like X, we see that Claude Fable 5 has caused a lot of criticism in the AI ​​research community.

The reason is simple: if Claude Fabre 5 is used for AI research and development, intelligence will decrease.

As clearly stated on the system card:

We have also added related safety measures. Developing a cutting-edge LLM. As discussed in Section 6.1 of our February 2026 Risks Report, we are concerned about the risks posed by the overall acceleration of AI development, but the severity of these risks remains uncertain. Specifically, as I pointed out at the time, We are concerned that “other AI developers may be accelerating the construction of powerful AI systems that may pose similar risks to our systems, but without corresponding safeguards in place.”

Considering the acceleration of the independent development of recent models, We have implemented new intervention measures to limit Claude’s effectiveness when handling requests related to state-of-the-art LLM development (e.g., aspects such as building pre-training processes, distributed training infrastructure, machine learning accelerator design, etc.).. Using Claude to develop competing models is a violation of our Terms of Service. Enforcing this restriction through safeguard measures prevents the process from being accelerated for users who are most likely to violate the terms.

Unlike intervention measures in network security, biology and chemistry, and distillation attempts, these safeguards are invisible to the user. Fable 5 does not fall back to other models. Instead, safeguards limit their effectiveness through methods such as instant modifications, guided vectors, and parameter efficient fine-tuning (PEFT).. These interventions do not affect the majority of your coding efforts. These are estimated to affect approximately 0.03% of traffic and are concentrated in less than 0.1% of organizations. If these intervention measures become effective, they are expected to have minimal impact on model behavior and only limit their effectiveness in state-of-the-art LLM development. Claude continues to actively respond to user requests. Even after the release of this model, we will continue to improve the accuracy of the detection method.

Source: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf

In easy-to-understand terms, it goes like this: When Anthropic’s systems detect that you’re doing AI research, the models silently get stupid without you even knowing.

This is quite different from how the other three types of security interventions are handled. For risks such as network security, biochemical, and distillation attacks, Fable 5 clearly informs the user, “This response was handled by Claude Opus 4.8.” Users know what’s going on and can decide accordingly. However, for LLM studies, Claude does not switch models or provide prompts. It just keeps getting quieter and weaker.

As a result, the AI ​​community is angry. Renowned research and analysis firm Semianalysis said this policy is actually impacting its research and programming work.

User Jake directly accused Anthropic on SemiAnalysis of not only making their models less intelligent, but also continuing to charge them, calling it a “total scam.”

Moreover, this behavior may already be illegal.

AI research platform alphaXiv also tweeted its disappointment.

The agency added: “Not only do they have the right to determine the purposes for which LLMs are used in research; They silently intervene in your research without your knowledge. This sets a dangerous precedent. Users can understand the boundaries if the model publicly rejects it. If a model falls back to another model, users can still evaluate the differences. However, by secretly modifying or weakening a model’s answers while pretending to be useful, researchers lose the ability to determine whether failed results are due to their own ideas, implementation, or the invisible intervention of the model provider. This is not security. Security policies must be transparent, auditable, and visible to users. ”

Researcher Guohao Li posed a more direct question. Are the AI ​​PhD students and engineers contributing to open source infrastructure like Megatron, FSDP, and Verl unknowingly using a quietly downgraded version of Claude in their day-to-day work?

Renowned AI researcher and technology writer Nathan Lambert published an important analysis of the Substack “interconnect,” examining the event from a more macroscopic perspective.

https://www.interconnects.ai/p/claude-fable-5-and-new-ai-safety

“Anthropic has gone on record saying that the proliferation of AI capabilities is a hidden danger, but the way they solve this problem is by misleading their users. An AI model that automatically becomes stupid without notifying me is essentially miscalibrated AI.”

He also pointed to a deeper contradiction in the issue. For network security and biochemical threats, Anthropic’s intervention is explicit and auditable, informing users that “this response will be handled by Opus 4.8.” However, for LLM studies, they choose implicit interventions. “It would be much more persuasive and easier to gain intellectual support if all security policies had the same form. This double standard leads people to suspect that these ‘safety measures’ are more about maintaining a competitive position. ”

Most interesting is Fable 5’s own statement. A screenshot by user ASM shows that Fable 5 itself also seems to find this opaque operation problematic when asked about the appropriateness of this approach.

Why does Antropic do this?

To understand this, we have to go back a few days to the release of Fable 5. Anthropic published a blog post titled “When AI starts building itself,” calling on the world’s leading AI research institutes to discuss the possibility of a “pause in development.”

https://www.anthropic.com/institute/recursive-self-improvement

The blog post cites the company’s internal data, showing that for the most difficult and least clearly explained coding tasks, Claude’s success rate reached 76% in May of this year, an increase of 50 percentage points over six months. In internal testing, when asked to run training code faster, Claude Opus 4 showed an approximately 3x speed increase, and the unreleased Mythos Preview achieved an approximately 52x speed increase.

Anthropic said bluntly: “We are concerned that other AI developers will be able to build powerful systems at a faster pace, with similar risks, but without corresponding safeguards.”

This is Fable 5’s rationale for setting up invisible intelligence reduction for LLM studies. Anthropic believes that AI is self-accelerating dangerously fast, and one of their moats is to prevent competitors from having the “most powerful tools” to help close the gap.

The system card also acknowledges the existence of this dual logic. “Using Claude to develop competing models violates our Terms of Service. Enforcing this restriction through safeguard measures will prevent the process from being accelerated for users who are most likely to violate our Terms.”

The anthropologically estimated impact of this intervention is 0.03% Traffic is concentrated in the following regions 0.1% of the organization.

“Shadow Muting” and the Crisis of Trust

While on the surface it appears that only some users are affected, critics are concerned that Blurred boundaries of this mechanism.

Anthropic defines the trigger condition as “.Cutting edge LLM development” and gives examples such as “designing pre-training processes, distributed training infrastructure, or machine learning accelerators.” But researchers and developers are asking a poignant question: As AI technology becomes more pervasive, where exactly is the line between “cutting-edge research” and “regular product development”?

Five years ago, training and modifying CLIP models was the exclusive right of top labs. Now, small teams can fine-tune vision language models for travel, e-commerce, search, and analytics products at any time. It has become common for startups to train embedded models, build rerankers, host open source models… Will these tasks cause an invisible intelligence reduction in Anthropic? No one knows.

this uncertainty It actually affects developers’ trust decisions. If you get a bad answer, you don’t know whether it’s your problem, a limitation of the model, or a policy of silence intervening. This unknowability is itself a kind of harm.

The system card also hides another detail. Because Mythos 5’s reasoning text is “more difficult to interpret than previous models, and contains more jargon and difficult-to-read language,” evaluators believe it is increasingly being tested. For companies that claim to be “safe AI” companies, these explanations raise as many questions as the invisible decline in intelligence itself.

conclusion

Fable 5’s release date was perhaps the most contradictory in Anthropic’s history.

The top-of-the-line model, which ranks first in almost all benchmark tests, and a measure that allows users to “pretend to help” at specific times were also released at the same time. While the former is an undoubted technological achievement, the latter is a worrying precedent on a value level.

The words of researcher Nathan Lambert are worth thinking about again and again. “An AI that silently becomes stupid without notifying its users is essentially a misaligned AI.”

This is not an indictment of Anthropic’s malice, but rather a point from the slippery slope of logic. Today “LLM research tasks are quietly becoming less efficient,” but what will happen tomorrow? If this line of logic applies more broadly, why should users believe that the answers they get are not announced in advance?



Source link