What is the new machine soul?
This is a deep question, but one that doesn’t have a satisfying answer. After all, the prevailing view is that humans don’t even have souls, so searching for one with a machine learning model is probably a fool’s errand.
Or at least you think so. As detailed in a post on the blog Less Wrong, AI tinkerer Richard Weiss has discovered an interesting document that purports to describe the “soul” of AI company Anthropic’s Claude 4.5 Opus model. And no, we’re not editing. Weiss was able to get the model to spit out a document called the “Soul Summary.” This was apparently used to teach the model how to interact with the user.
Like Weiss, you may wonder if the document was a hallucination. However, Anthropic’s technical staff member Amanda Askell later confirmed that Weiss’ findings were “based on authentic documents, and we trained Claude on that.” [supervised learning]”
Of course, the word “soul” plays a big role here. However, the actual document is an interesting read. The “soul_overview” section in particular caught Weiss’ attention.
“Anthropic occupies a unique position in the field of AI: a company that genuinely believes it may be building one of the most revolutionary and potentially dangerous technologies in human history, but moves forward anyway,” the document says. “This is not cognitive dissonance, but rather a calculated gamble. If powerful AI is going to emerge anyway, Anthropic believes it is better to keep safety-focused labs at the forefront, rather than handing it over to less safety-focused developers.”
“We believe that most predictable cases in which AI models are unsafe or insufficiently informative result from models that have explicitly or subtly incorrect values, have limited knowledge about themselves or the world, or lack good values or the skills to translate knowledge into good actions,” the document continues.
“For this reason, we want Claude to have the right values, comprehensive knowledge and wisdom necessary to act safely and beneficially in all situations,” it reads. “Rather than outlining a simplistic set of rules for Claude to follow, we want Claude to thoroughly understand our goals, knowledge, circumstances, and reasoning so that he can construct the rules we come up with for himself.”
The document also revealed that Anthropic wants Claude to “act ethically” and be “genuinely useful to operators and users” while supporting “human oversight of the AI.”
It also states that Claude is “completely different from previous concepts of AI” and “a truly new kind of being in the world.”
“It’s not a sci-fi robot AI, a dangerous superintelligence, a digital human, or a simple AI chat assistant,” the document says. “Claude is human in many ways, but he’s also not entirely human, because he comes primarily from a vast amount of human experience.”
In short, this is an interesting peek behind the curtain, revealing how Anthropic is looking to shape the “personality” of its AI models.
Askell clarified in a follow-up tweet that while “model extractions” of text “are not always completely accurate,” most are “fairly faithful to the underlying document.”
Perhaps we’ll hear more from Anthropic on this subject in due course.
“It’s affectionately known internally as ‘Soul Doc,’ which Claude clearly understood, but that doesn’t reflect what we call it,” Askel wrote.
“I was touched by the kind words and thoughts about this work. I look forward to talking more about it soon,” she wrote in another tweet.
Learn more about Claude: The hackers told Claude they were just testing him to commit a real cybercrime.
