
-
This concludes our series on models of the world.
-
In the Opinion section, we dive into one of my favorite new topics: Harness Engineering.
-
This week on AI, we have to take a deep dive into that mythical thesis.
New models weren’t the only AI announced this week. These were three different answers to the deeper question of what a frontier model is. for• Anthropic’s Claude Mythos Preview states that the frontier is a means of security. According to Meta’s Muse Spark, the frontier is an always-on consumer base woven into products that they already interact with dozens of times a day. According to Z.AI’s GLM-5.1, the frontier is a model that requires long hours and can point out difficult engineering problems and run with them. Same board, three product fates.
This distinction is important because we are going away from the days when every release looked like a slightly smarter chatbot. The real difference right now is the unfolded geometry. Where do models live? How much autonomy do they have? Who trusts it? And what units of value are you optimizing for, such as answers, attention, and work completed?
Let’s start with “anthropology”. Mythos wasn’t the only big thing announced this week. It was Project Glasswing. In effect, Anthropic is saying that if the model is good enough to discover and chain software vulnerabilities with minimal human interaction, the questions you ask about the product will change. It’s no longer just about providing information. You manage a dual-use cyber function. So instead of a large-scale release, Anthropic worked with key infrastructure and security partners to incorporate Mythos into a tightly controlled defensive security program. That framing is the signal. In the Anthropic worldview, frontier abilities are not something that can be used to their full potential. It’s what you measure, measure and put behind institutional control. This model is more like a sensitive device than an app.
The meta movement is almost a mirror image. Muse Spark is not presented as a rare research artifact. Introduced as the surface runtime for Meta. With its small, fast design, multimodal, ability to switch between faster and deeper inference modes, and the ability to dispatch subagents in parallel, Muse Spark is optimized for product loops rather than leaderboard theaters. The important detail is that Meta ties the model to the distributions you already own: Meta AI, Instagram, Facebook, Messenger, WhatsApp, and eventually Glasses. This is a very meta bet. The company is not betting that the absolute smartest model will win on its own. We’re betting that a winning system can be one that’s always present, visually grounded, and built into your graphs, cameras, feeds, and habits. Here, intelligence becomes ambient software.
Next up is GLM-5.1, which is the most interesting release of the week for builders. Z.AI is centered around long-term execution, including large-scale contexts, very large output windows, the use of powerful tools, and hours of continuous work on a single task. It shows that even what we measure is changing. For a while, the industry was obsessed with the cleverness of one turn. Can the model meet benchmarks, write neat paragraphs, and solve the coding tasks involved?However, most economically profitable work does not. It’s messy, stateful, and repetitive. It includes planning, testing, breaking, retrying, and converging. The real story of GLM-5.1 is durability. The claim is more than just “I’m smart.” That means being able to continue working.
Taken together, these three products represent a market that is beginning to fragment along various axes. Anthropic builds a protected intelligence layer for critical infrastructure. Meta is building an ambient consumer coprocessor that blends with distribution. Z.AI is building the agent flagship product for developers. Even though the economics of the same transformer are the same, the product philosophies are very different.
If 2023 was about chat and 2024 was about co-pilots, 2026 is starting to look like the year models became operational systems. It’s become limited, embedded, persistent, and judged not by how impressive it sounds in a demo, but by what it can actually complete without a user.
AI Lab: human
summary: Anthropic’s Claude Mythos Preview represents a major leap forward in frontier AI capabilities, particularly in autonomous cybersecurity and software engineering, and the company will limit its release to only trusted partners for defensive purposes. Extensive safety evaluations revealed that while the model is highly regulated and psychologically stable, its high degree of autonomy introduces new risks of reckless or destructive behavior during complex tasks, highlighting the need to improve safety equipment before wider deployment.
AI Lab: University of California Santa Barbara, MIT CSAIL, MIT-IBM Watson AI Lab
summary: In this paper, we investigate the practical utility of agent skills under realistic conditions where agents in large language models must autonomously acquire and adapt skills from large and noisy collections. The authors demonstrate that the performance benefits of skills are significantly reduced in such difficult settings, but that query-specific skill improvements can significantly recover lost performance.
AI Lab: UNC Chapel Hill, University of California Santa Cruz, University of California Berkeley
summary: In this paper, we introduce ClawArena, a benchmark designed to assess how well AI agents maintain and update their beliefs when interacting with multi-source, dynamic, personalized information environments. By testing the model across 64 scenarios using hidden ground truth, the authors reveal that model features and framework design have a significant impact on performance, and point out that self-evolving skill frameworks are particularly promising for closing the performance gap.
AI Lab: Meta AI
summary: To address the prohibitive computational cost of running a complete machine learning pipeline during reinforcement learning, the authors propose SandMLE, a multi-agent framework that uses microscale datasets to generate verifiable synthetic environments. This approach enables on-track policy-based reinforcement learning, which accelerates execution times by more than 13x and significantly improves agent performance in machine learning engineering benchmarks.
AI Lab: Meta AI, KAUST, and collaborators
summary: This paper introduces neural computers (NCs), a new computing paradigm that integrates computation, memory, and I/O within a single trained model state, rather than relying on an external execution environment. As an initial proof of concept, the authors develop a video-based model that can simulate command lines and graphical interfaces directly from I/O traces and demonstrate short-term control success while outlining a roadmap for achieving stable general-purpose neural computing.
AI Lab: MIT, UIUC, CMU, USC, UVA, and UC Berkeley
summary: To address the severe infrastructure costs associated with training general-purpose computer agents, the authors introduce OSGym, a highly scalable distributed data engine that can process over 1,000 complete operating system replicas in parallel. By implementing hardware-enabled orchestration and copy-on-write disk management, OSGym significantly reduces physical disk consumption and provisioning time, enabling academic labs to run high-throughput data collection and reinforcement learning pipelines on limited budgets.
human Announcing Project Glasswinga new effort to protect the world’s most important software based on findings from Mythos.
Meta Super Intelligence Research Institute Muse Spark releasedthe first model with multimodal reasoning, tool usage, and visual thought chaining capabilities.
Zai Open source GLM-5.1a new version of the Marquee model with amazing coding features.
-
Zero Shot Fund (OpenAI Alumni VC Fund) A former OpenAI engineer and venture capitalist has made an early bet on Worktrace AI and Foundry Robotics to raise the first round of Zero Shot, a new $100 million venture fund targeting AI and robotics startups.
-
Alibaba leads ShengShu (Vidu) funding round Alibaba Cloud led a 2 billion yuan (approximately $293 million) funding round for ShengShu Technology, maker of the Vidu AI video generator. The company plans to use the funding to develop a general world model that bridges the digital and physical AI domains.
-
Elorian (formerly Google DeepMind visual AI startup) Former Google DeepMind researcher Andrew Dai launched Elorian, a visual reasoning AI startup that emerged from stealth with $55 million in funding at a $300 million valuation to build AI that better understands images for applications in architecture, automotive, and robotics.
-
CoreWeave Expands Meta Transactions to $21 Billion CoreWeave and Meta announce a $21 billion expanded AI cloud infrastructure agreement through December 2032. This includes early deployment of NVIDIA’s Vera Rubin platform and brings CoreWeave’s total Meta contract up to $35 billion.
-
CoreWeave announces multi-year agreement with Anthropic CoreWeave also announced a multi-year agreement to provide cloud infrastructure to support the development and deployment of Anthropic’s Claude family of AI models. This brings computing power online with CoreWeave, which will be rolled out in phases later this year, and CoreWeave is now the infrastructure provider behind all four leading AI model developers.
-
Eclipse Ventures raises $1.3 billion Eclipse Ventures, an early backer of Cerebras, has completed a record $1.3 billion in investments in physical AI, robotics, manufacturing, and defense startups across two funds ($720 million in early stages and $591 million in growth), bringing total assets under management to nearly $10 billion.
-
Spain’s Xoople raises $130 million in Series B Xoople, a Spanish startup building an AI-optimized satellite constellation that provides high-fidelity Earth observation data for enterprise AI, has raised $130 million in Series B led by Nazca Capital and announced a sensor co-development agreement with L3Harris.
-
Anthropic acquires Coefficient Bio Anthropic has acquired Coefficient Bio, a stealth biotech AI startup founded just eight months ago with fewer than 10 employees (mostly former Genentech researchers), in an all-stock deal valued at just over $400 million, to strengthen its healthcare and life sciences division.
