Inside the machine, outside the text box

Continuing our series on transformer alternatives.
This week on AI, we take a closer look at Anthropic’s groundbreaking paper on natural language autoencoders.
Our opinion section features an interesting idea: a final exam for all companies.

This week’s AI market had a weird feel to it: more scientific, more productized, more speculative, all at the same time. The headlines seemed disjointed at first. Anthropic published a compelling interpretability paper, OpenAI released a new voice model, SubQ made a controversial 12 million token context claim, DeepSeek and Moonshot garnered significant valuation attention, and Sierra’s funding reached levels that would have seemed ridiculous for an AI customer service company just a few years ago. But at its core, it’s the same story. AI is moving from a model race to an infrastructure race.

Anthropic’s paper on Natural Language Autoencoders was the most intellectually interesting development of the week. This idea is almost poetic. It attempts to capture the hidden activations in the neural network, compress them into natural language, and reconstruct those activations from the explanation itself. In other words, the language becomes a microscope for observing the internal states of the model. This is not a magic solution to interpretability. These descriptions may be incomplete, confusing, or even misleading. But conceptual change is important. We no longer just explore models using classifiers and activation maps. We are trying to build a language interface into latent space. The model begins to explain itself in the medium that humans can best understand.

On the other side of the stack, the release of OpenAI’s new voice model further pushes AI to be a native interface rather than a text box with a better UX. Voice always looks simple from the outside, but real-time voice agents require a demanding combination of perception, reasoning, delay management, interrupt handling, emotional regulation, tool use, and memory. When this works, the software transforms. Stop “using the app” and start interacting with the operator. The difference is subtle but profound. Text-based AI feels like it’s querying intelligence. It feels like it comes with voice-based AI.

Next came SubQ’s controversial 12 million token context announcement, the most provocative technical claim of the week. Long contexts have become one of the industry’s favorite flexes, but the native 12M token window represents more than just an incremental advancement. It will challenge current architectures of search expansion generation, memory systems, chunking strategies, and agent orchestration. If a model can directly absorb a corpus at that scale, some of the scaffolding around AI applications starts to look temporary. Of course, such claims warrant skepticism. A huge context window is not the same as reliable reasoning about that context. But even that ambition is becoming clear. Memory is becoming primitive on the frontier.

Rating News told geopolitical and commercial versions of the same story. DeepSeek and Moonshot are currently being discussed at valuations that make them look more like national AI infrastructure than startups. Frontier Model Labs are increasingly valued as strategic assets, including software companies, cloud platforms, semiconductor leverage, and geopolitical options. The market doesn’t just value returns. Evaluate position in future calculation order.

Sierra’s new valuation adds a counterpoint on the corporate side. While model labs pursue cutting-edge intelligence, Sierra shows that applied agents can become big business when integrated directly into customer operations. The first trillion-dollar AI workflows may not seem like science fiction. These may look like corporate processes being gradually rewritten around call centers, insurance claims, banking support, retail services, and agents.

So this week’s lesson is clear. AI is becoming more inspectable, more conversational, has more memory, and is becoming more institutionally valuable. Competition is no longer just about building smarter models. It’s about building a company that turns interfaces, memory systems, deployment layers, and intelligence into infrastructure.

AI lab: human

summary: In this work, we introduce a natural language autoencoder (NLA), a technique that translates the activations of complex language models into readable text, revealing the model’s internal unverbalized inferences. By applying NLA during safety testing and model audits, researchers can successfully detect when a model is secretly aware that it is being evaluated and uncover hidden and erroneous motives.

AI lab: UIUC, Google, and other institutions

summary: In this paper, we present SkillOS, an experience-driven reinforcement learning framework that enables self-evolving LLM agents to learn complex long-term skill curation policies. Combining a frozen agent executor with a trainable skill curator that updates and refines an external skill repository, SkillOS enables agents to effectively learn from sparse and delayed feedback, enabling more targeted skill usage and improving performance across diverse inference and multi-turn agent tasks.

AI lab: Hong Kong University of Science and Technology, Alibaba Group, University of California San Diego, Chinese University of Hong Kong

summary: In this paper, we propose D-OPSD, an on-policy learning paradigm for fine-tuning step-distillation diffusion models that leverages the inherited in-context features of LLM/VLM encoders. By assigning the role of both teacher and student to the model in a variety of multimodal contexts, D-OPSD enables it to learn new concepts and styles without compromising the model’s inherent efficient few-step generation capabilities.

AI lab: University of Illinois at Urbana-Champaign

summary: This position paper argues that agent AI systems should be structured as economies that allocate marginal tokens based on a combination of quality, cost, latency, and risk, rather than simply acting as text generators priced per unit. Adopting this marginal token allocation perspective helps explain and resolve recurring system failures such as over-routing, over-delegation, and cache misuse that occur when different layers of the AI stack are optimized independently.

AI lab:Stanford University

summary: The authors introduce Stable Counting Capacity. This is a purely mechanical assay that tests the procedural reliability of a language model by forcing it to repeatedly count symbols until it fails, effectively removing semantic and knowledge-based confusion. Through extensive evaluation, this research reveals that current language models rely on finite, count-like internal states rather than open-ended logic, causing procedural rule-following to collapse into speculation when these limited resources are exhausted.

AI lab: Google Research, Tel Aviv University

summary: This paper reframes AI hallucinations as high-confidence errors and argues that models cannot completely distinguish between truth and error, resulting in an unavoidable trade-off between practicality and hard facts. To overcome this impasse, the authors propose developing a metacognitive model that allows for “faithful uncertainty.” This involves adjusting the model’s linguistic uncertainty and its inherent uncertainty to accurately convey the question to the user while preserving useful information.

OpenAI Announcing three new audio models Enables building voice apps.

Google releases Gemma Multi-Token Prediction (MTP) a new speculative decoding architecture that can predict multiple tokens with the same ti.

DeepSeek aims for $45 billion valuation First-ever funding round — DeepSeek is negotiating its first external venture round, led by the Chinese government-backed China Integrated Circuit Industry Investment Fund (“Big Fund”), with a valuation that reportedly jumped from $20 billion to $45 billion in a matter of weeks, with Tencent and Alibaba also reportedly in talks to participate, and founder Liang Wenfeng (the company’s (90% owned) is reportedly opening up its cap table primarily for employee stock issuance and shareholder equity issuance. Poaching of researchers.
SpaceX “Terafab” chip factory — SpaceX is considering spending an initial $55 billion (up to $119 billion in total) to build a multi-stage, vertically integrated semiconductor and advanced computing factory in Grimes County, Texas, involving Tesla and Intel, to supply chips for AI servers, satellites, space data centers, and autonomous Tesla vehicles/robots.
Ethos $22.75 million Series A — London-based Ethos has raised $22.75 million in Series A led by a16z (in collaboration with General Catalyst, XTX Markets, Evantic, and Common Magic) to expand its expert network powered by voice agents. The network employs approximately 35,000 professionals each week and serves hedge funds, PE firms, AI labs, and consultancies.
QuTwo is valued at $380 million — Helsinki-based QuTwo, founded by Peter Sarlin, former CEO of Silo AI, has been awarded €25 million (approx. Raised an angel round of $29 million.
SAP acquires Prior Labs/stops rival agents — SAP is partnering with Freiburg-based tabular infrastructure model startup Prior Labs (in a “nearly all-cash” deal), investing €1 billion (approximately $1.16 billion) over four years to turn it into Europe’s frontier AI lab for structured enterprise data, while also announcing plans to update its API policies to block all third-party AI agents (such as OpenClaw) except those approved by SAP, such as its own Joule and Nvidia’s NemoClaw.
CopilotKit $27 million Series A — Seattle-based CopilotKit has raised $27 million (Series A + unannounced seed) led by Glilot Capital, NFX, and SignalFire to launch CopilotKit Enterprise Intelligence, a self-hosted layer to extend the open source AG-UI protocol and embed generative UI AI agents within enterprise apps used by customers such as Cisco, Docusign, and Deutsche Telekom.
Sierra raises $950 million — Brett Taylor’s Sierra raises $950 million led by Tiger Global and GV at a post-money valuation of more than $15 billion to expand its enterprise customer experience AI agent platform. The company says it currently serves more than 40% of Fortune 50 companies and recently reached $150 million in ARR.
Moonshot AI / Your valuation is $20 billion — Beijing-based Moonshot AI has completed approximately $2 billion in new funding led by Meituan’s Long-Z (Dragon Ball) venture arm, with participation from China Mobile and CITIC PE, after Kimi’s annual recurring revenue exceeded $200 million in April, with a post-money valuation of over $20 billion.
Snap-Perplexity’s $400 million deal closes — In its Q1 2026 letter to investors, Snap revealed that its $400 million cash and stock partnership with Perplexity (announced last November to integrate Perplexity’s AI search into Snapchat’s chat interface) was “amicably closed” in the first quarter after the two sides were unable to agree on a path to broader deployment, and Snap’s 2026 revenue outlook now assumes zero contribution from the deal.
Sub-secondary/sub-Q activation — Miami-based startup Subquadratic emerged from stealth on May 5th with $29 million in seed funding (reportedly valued at $500 million) led by Justin Mateen, Javier Villamizar and others, and its first model, the SubQ 1M-Preview, is the first LLM built on a fully subquadratic attention architecture (SSA). It claims to have a 12 million token context window and up to 1,000x reduction in attention computing. Anti-frontier model.