AI automation creates more specialized jobs

Zerorun electric vehicle intelligent factory in Jinhua

Every company rushing to automate knowledge work is discovering a similar uncomfortable paradox. The more tasks you delegate to an AI agent, the more human judgment is required to make it useful. “We’ve seen a lot of work done in the last few years,” said Dan Schipper, CEO of Every, a media and AI research company that has aggressively automated the entire coding, writing, and customer service landscape. A detailed explanation was published this week. His team of 30 people explains what it actually looks like on the other side of automation. His conclusion is counterintuitive and has important economic implications for investors who are factoring talk of workforce displacement into corporate AI investments. AI commercializes yesterday’s capabilities and immediately increases the demand for the expert judgment needed to guide, review, and improve them.

Capital needs updating

Venture capital has overwhelmingly bet on workforce displacement as the growth story for AI. AI companies will capture 61% of global VC investments by 2025According to an OECD analysis published in February 2026, it has pulled in $258.7 billion out of a total market value of $427.1 billion, and its share has risen to approximately $5 billion. 80% of global VCs in Q1 2026Powered by Frontier Labs mega-round with , OpenAI ($122 billion), Anthropic ($30 billion), and xAI ($20 billion). Implicit model: AI will replace headcount, expanding profit margins and justifying multiples. But Shipper’s ground-level data from organizations that are automating more than most suggests that the actual dynamics are more complex and the resulting set of opportunities is different than what migration theory would imply.

Economists at Anthropic have documented the gap between AI’s theoretical labor market footprint and its actual labor market footprint. Papers for March 2026 Co-authored with Peter McCrory, Head of the Department of Economics. They found that while AI could theoretically cover large portions of computer science, financial management, and legal work, the observed usage of Claude across enterprises was only a fraction of the theoretical upper limit. The gap between capability and adoption is not a temporary delay in adoption. This reflects the structural requirement that the problem must be framed by someone with relevant expertise before the model can tackle it.

Frame issues, no benchmark captures

Schipper frames his argument around what he calls the frame problem. Benchmarks measure how well a model performs within the problem definition that a human has already provided. OpenAI’s GDPval benchmark tests AI performance against expert-level tasks across professions such as compliance officers, lawyers, and software developers. Claude Opus 4.1 outperformed human experts 49% of the time. This headline number led to a series of news coverage. What made it confusing was that the benchmark prompts for these tasks were preloaded with precise confidence intervals, enumerated criteria, named entities to include, and output format specifications. Before the model executes a single token, a huge amount of expert judgment is already encoded into the frame.

Benchmarking from shippers’ in-house senior engineers points to the same point from a different direction. A coding agent given explicit instructions to perform a “clean ab initio rewrite” of a broken codebase achieved a score of 62/100 for its best GPT-5.5 execution, nearly 30 points ahead of its competitors. If you change the prompt to “Resolve any errors that keep popping up” the score will decrease towards zero. The performance of a model is inseparable from the quality of the frame that humans construct around the task.

This is not a bug that will be fixed in the next model. This is a property of how language models are constructed. The model is trained based on the recorded output of completed work. They do not have access to the present tense judgment necessary to decide which issues to frame, why now, to what extent, and against which constraints. That judgment has to come from somewhere. In current and near-future architecture, it will come from humans.

The cycle of abundance and its financial sources

The second mechanism for shippers is economic. When a rare skill becomes cheaper, the demand for that skill increases. Every’s operations staff started issuing pull requests that they would never have attempted before. Marketers created video thumbnails in minutes, engineers drafted product copy, and the amount of work in each category exploded. But the default output of models trained on the same corpus tends toward sameness, and sameness quickly becomes a commodity. As a result, there is a growing demand for humans who can identify what differentiates good work from good work in a given situation.

AI automation paradox

Josipa Majik Predin

This pattern manifests itself in the cost of automation itself. One of Every’s PowerPoint automation workflows includes 24 skills and 18 scripts and costs $62 in tokens per deck. This is a new class of infrastructure that requires ongoing human maintenance to keep it aligned. OpenClaw open source repositoryReferenced by Shipper as a proxy for the size of AI-assisted development activity, it received 44,469 pull requests as of mid-May 2026, with approximately 4,000 pull requests in the first three weeks of May alone. For context, Kubernetes received 5,200 pull requests in all of 2022. The amount of AI-assisted work being generated around the world is historically unprecedented. Humans are required to review, direct, and maintain that work.

What this means for the market

The practical implication for investors is that the market map will diverge significantly from pure labor replacement strategies. Companies that build around increasing expertise rather than reducing headcount, sell review and adjustment workflows rather than task execution layers, and address the growing infrastructure requirements of human-agent collaboration will be positioned to meet persistent demand regardless of benchmark score trends.

Enterprise buyers who are the earliest to adopt AI are not reporting empty org charts. They are reporting new categories of work. It’s the AI engineers who maintain the agent workflow, the senior practitioners who review the AI-generated output at scale, and the domain experts who translate the real business context into problem frames that make the models useful. That’s not the story that justifies $300 billion in VC in Q1 2026. That could be the story that justifies the next $300 billion.

A more difficult question, one that shippers have not yet resolved, is whether the specialist pool will generate enough economic surplus to offset displacement in lower-skilled roles. Anthropic CEO Dario Amodei warns Up to half of entry-level white-collar jobs could be eliminated by AI. These two claims are compatible. That is, expert jobs expand at the top of the distribution and entry-level jobs contract at the bottom. What power relations will prevail over the next decade is the most important open question in the labor economics of AI, and no benchmark yet being constructed can answer this.

Source link