Artificial intelligence is becoming more and more human-like. In fact, superintelligence, where machines can even become conscious, is the ultimate goal of many major technology companies. But how exactly do these LLMs work and how do they think? We know the basics of training, but what’s really going on behind the screen? Now, Anthropic’s most advanced language model, Claude 4.5 Opus, recently and unexpectedly provided the public with a rare window into the hidden mechanisms that shape the behavior of modern AI models. And apparently it has something called a “soul document” that helps it follow rules and values that guide how it thinks, reacts, and acts.
The revelation came from independent researcher Richard Weiss, who was able to get the chatbot to reveal its system instructions. At that time, Claude referred to and reproduced an internal document called the “Summary of the Soul.” Weiss said the text of the document is more than 11,000 words and appears to outline the behavioral framework of Anthropic’s model, including ethical boundaries, tone and safety priorities. This is surprising, given that such documents are usually closely guarded within AI laboratories, making their sudden appearance both unusual and significant.
These internal guidelines reportedly govern how language models interact with users, respond to sensitive queries, and avoid insecure behavior. Note that all major AI systems are built on similar materials, but companies rarely share them publicly due to concerns about intellectual property, safety, and potential abuse. That’s why the discovery of the Soul’s Synopsis, even in its partial or reconstructed form, has rapidly reignited the debate over transparency and accountability in AI development.
Where do soul documents come from?
Weiss said he discovered this unusual document when he casually asked Claude to list the system prompts. Among the usual architectural guidelines, the model mentioned some internal text, including one labeled “soul_overview”. As Weiss pushed further, Claude repeatedly produced the same long document explaining how to remain safe, remain useful, and avoid crossing Antropic’s ethical “bright lines.”
Weiss noted that chatbots often hallucinate when pressed on system commands, but the consistency of Claude’s responses across multiple trials suggested that the text was not created on the fly and was based on real training data. “I’m used to the model of having a hallucinatory section at the beginning of system messages since Claude 4, but Claude 4.5 Opus included a supposed ‘soul_overview’ section that sounded quite specific in various cases,” Weiss wrote in the research report.
Now, Anthropic’s own team has offered partial confirmation of the document. Amanda Askell, a philosopher and member of the company’s technical staff, confirmed on X that the reproduced documents were “based on real documents” used during supervised learning. She added that the text is still evolving and will not necessarily be perfectly reproduced by the model. “While the model extractions are not always completely accurate, they are mostly fairly faithful to the underlying documents. This became affectionately known internally as the ‘soul document’, and Claude clearly understood it, but it is not a reflection of what we call it,” she wrote to X.
– end
