Google DeepMind explains the pain of building AI agents

Machine Learning


Google DeepMind’s Philipp Schmid recently shared insight into why even experienced engineers face challenges when building AI agents. This talk, titled “Why (senior) engineers struggle with building AI agents,” focuses on five important “mental model conflicts” that arise when moving from traditional engineering practices to the world of AI agents.

Google DeepMind explains the pain of building AI agents - AI Engineer

Google DeepMind explains the pain of building AI agents — from an AI engineer

Visual TL;DR. Engineers’ mindsets and agents’ realities lead to difficulties in building AI agents. The struggle to build an AI agent leads to Text is New State. The text is a new state, leading to a handover of control. Yielding control will result in an error. This is just an input. Since the error is just an input, the evaluation comes from the unit test. “Error is just input” leads to “adaptation” and “loop”.

  1. Engineer thinking and agent reality: Traditional linear deterministic development versus stochastic adaptive agent development
  2. Text is a new state: agents interpret and produce text for understanding and action.
  3. Handover of control: Engineers must trust agents to make decisions and take actions
  4. Errors are just input: Mistakes are learning opportunities for agents to improve and adapt
  5. From unit testing to evaluation: Moving from rigorous code checking to comprehensive agent performance evaluation
  6. The struggle to build AI agents: Senior engineers face mental model conflicts when building AI agents
  7. Adaptation and loops: Agents observe, adapt, and repeat their actions based on feedback.

Visual TL;DR
Visual TL;DR—startuphub.ai Engineers’ mindsets and agents’ realities lead to difficulties in building AI agents. The struggle to build an AI agent leads to Text is New State. The text is a new state, leading to a handover of control. Passing control causes an error It’s just an input Engineer’s mindset and agent’s reality

text is new

Handover of control

Error is just input

The struggle to build AI agents

From startuphub.ai · Publishers behind this format

Visual TL;DR—startuphub.ai Engineers’ mindsets and agents’ realities lead to difficulties in building AI agents. The struggle to build an AI agent leads to Text is New State. The text is a new state, leading to a handover of control. Passing control causes an error It’s just an input Engineer’s way of thinkingvs. agent reality

text is new

handovercontrol

error is justinput

Building an AI agentstruggle

From startuphub.ai · Publishers behind this format

Visual TL;DR—startuphub.ai Engineers’ mindsets and agents’ realities lead to difficulties in building AI agents. The struggle to build an AI agent leads to Text is New State. The text is a new state, leading to a handover of control. Passing control causes an error It’s just an input Engineer’s mindset and agent’s reality Traditional linear determinism vs.Stochastic adaptive drug development text is new Agent interprets and generates textunderstanding and action Handover of control Engineers must trust and create agentsdecide and take action Error is just input Mistakes are learning opportunitiesImproving and adapting agents The struggle to build AI agents Senior engineers face mental modelsConflicts when building AI agents

From startuphub.ai · Publishers behind this format

Visual TL;DR—startuphub.ai Engineers’ mindsets and agents’ realities lead to difficulties in building AI agents. The struggle to build an AI agent leads to Text is New State. The text is a new state, leading to a handover of control. Passing control causes an error It’s just an input Engineer’s way of thinkingvs. agent reality traditional linearDeterministic vs.Probabilistic… text is new agent interpretsand generate the textFor understanding… handovercontrol engineers musttrust your agentMake a decision… error is justinput The mistake islearnOpportunity for… Building an AI agentstruggle senior engineermental model of the faceWhen there is a collision…

From startuphub.ai · Publishers behind this format

Visual TL;DR—startuphub.ai Engineers’ mindsets and agents’ realities lead to difficulties in building AI agents. The struggle to build an AI agent leads to Text is New State. The text is a new state, leading to a handover of control. Yielding control will result in an error. This is just an input. Since the error is just an input, the evaluation comes from the unit test. Errors are just inputs, leading to adaptation and loops Engineer’s mindset and agent’s reality Traditional linear determinism vs.Stochastic adaptive drug development text is new Agent interprets and generates textunderstanding and action Handover of control Engineers must trust and create agentsdecide and take action Error is just input Mistakes are learning opportunitiesImproving and adapting agents From unit tests to evaluation Moving from strict code checking to comprehensive code checkingAgent performance evaluation The struggle to build AI agents Senior engineers face mental modelsConflicts when building AI agents adapt and loop The agent observes and adapts its behavior,Iterate based on feedback

From startuphub.ai · Publishers behind this format

Visual TL;DR—startuphub.ai Engineers’ mindsets and agents’ realities lead to difficulties in building AI agents. The struggle to build an AI agent leads to Text is New State. The text is a new state, leading to a handover of control. Yielding control will result in an error. This is just an input. Since the error is just an input, the evaluation comes from the unit test. Errors are just inputs, leading to adaptation and loops Engineer’s way of thinkingvs. agent reality traditional linearDeterministic vs.Probabilistic… text is new agent interpretsand generate the textFor understanding… handovercontrol engineers musttrust your agentMake a decision… error is justinput The mistake islearnOpportunity for… From unit testEvarus Transition from rigid bodiescode checkGeneral agent… Building an AI agentstruggle senior engineermental model of the faceWhen there is a collision… adapt and loop agent observes,adapt your behavior, andIterate based on…

From startuphub.ai · Publishers behind this format

Engineer’s mindset and agent’s reality

Schmidt begins by contrasting the deterministic nature of traditional software engineering with the probabilistic approach required for AI agents. In traditional software, engineers define explicit steps to write code, rigorously test it, and deploy it. This process is linear and predictable. However, building AI agents requires a different paradigm.

  • Define: Instead of strict definitions, agents are given instructions or goals.
  • Observe: Agents interact with the environment and receive feedback.
  • Adapt: Based on observations and feedback, agents adjust their behavior.
  • Loopback: This iterative process allows for continuous learning and improvement.

This fundamental difference in approach, Schmidt explains, often leads to engineers trying to “encode” the inherent probabilistic nature of AI, leading to what he outlines as “clash of mental models.”

Key challenges and solutions

Schmidt identifies several key areas where engineers often encounter difficulties.

1. Text is new

Traditionally, software state is represented by discrete data structures and Boolean values. However, for AI agents, especially those leveraging large-scale language models (LLMs), text becomes the primary means of expressing information and intent. The trap here is to treat natural language instructions as if they were simple boolean values, failing to capture their nuanced semantic meaning. This modification involves preserving this semantic meaning through the raw string and allowing agents to intelligently interpret and process this information downstream.

2. Handover of control

In microservices, user intent is often mapped to a specific route. Engineers intuitively hand-code these paths. However, with AI agents, interactions are more fluid and less deterministic. The trap is to treat the agent as just a traffic controller and expect it to follow a strict predefined path. Instead, agents should be trusted as disambiguating dispatchers. The key insight is to describe what you’re looking for rather than the exact path to get there, offering constraints and steps rather than a rigid route.

3. Errors are just input

Traditional software development often fails quickly or crashes when an error occurs. While this approach is effective for deterministic systems, it is counterproductive for AI agents. If the agent fails quickly due to a minor schema failure, it may cost $0.50 and take 5 minutes to debug, but crashing at a critical step (4 out of 5) is unacceptable. Conflicts occur when engineers treat errors as critical failures. The fix is ​​to take errors as valuable input and allow the agent to learn from them and self-correct. This involves catching errors and feeding them back into the agent’s process, allowing the agent to try a different approach.

4. From unit tests to evaluation

Evaluating AI agents is very different from traditional software testing. Unit tests that rely on deterministic assertions are not sufficient. Schmid emphasizes the need to move to “eval”, which is designed for non-deterministic output. This involves running multiple trials per prompt to measure the distribution of outcomes. Negative cases are important. Testing whether the agent ignores irrelevant information is just as important as testing the agent’s core functionality. Additionally, the focus should be on evaluating the outcome rather than the specific path the agent took to get there. This means evaluating how often agents succeed and ensuring reliability, rather than enforcing strict incremental compliance.

5. Agents will evolve, but APIs will not.

A significant challenge lies in the static nature of the API and the dynamic evolution of the agent. Traditional APIs are often designed with a “human grade” approach, expecting clear and well-defined parameters. However, agents are literal in nature and can hallucinate ambiguous parameters. The trap is that agents build APIs as if they were human developers. The solution is to create an “agent-aware” API that is explicit, verbose, and self-documented. This means providing a clear description of the function and its expected behavior, including what happens if the item is not found, ensuring that the agent has all the context it needs without guessing.

Summary: Trust but verify

Schmid concluded by summarizing the core principles for building effective AI agents.

  • Stop fighting models: Accept that you are a dispatcher, not a programmer.
  • Preserve meaning: Treats text as the primary state, not just boolean values.
  • Designed for recovery: Build agents that can learn and adapt from errors.
  • Evaluate, but do not assert: Measure your performance through multiple trials and LLM assessments as a judge.
  • Removed from build: Understand that agents evolve and their underlying models need to be rebuilt and improved over time.

The basic takeaway is that building AI agents requires thinking differently, accepting the probabilistic nature of these systems, and adapting traditional engineering methods accordingly.

© 2026 StartupHub.ai. Unauthorized reproduction is prohibited. Please do not type, scrape, copy, reproduce or republish this article in whole or in part. Use for AI training, fine-tuning, search enhancement generation, or as input to any machine learning system is prohibited without a written license. Substantially similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer abuse laws. See our Clause.



Source link