AI’s next frontier: the physical world

The dominant paradigm for AI today is language and code. However, a new wave of advances is extending frontier AI to the physical world and presenting a new paradigm for physical AI. According to a16z Blog analysis, this change is being driven by advances in robotic learning, autonomous science, and new human-computer interfaces.

These areas are simultaneously maturing, with increased talent, capital, and founder activity. The pace of progress suggests that these fields may soon enter into their own expansion regimes, requiring significant new developments while inheriting infrastructure and research momentum from the current AI frontier.

Three areas fit this description: robotic learning, autonomous science (especially materials and life sciences), and new human-machine interfaces. These areas are not isolated. They share fundamental technological primitives and reinforce each other.

basic primitives

Several core technologies underpin this expansion into the physical world.

Learned representations of physical mechanics

A compressed model of physical behavior, the ability to learn how objects move, deform, and collide, is critical. The Vision Language Action (VLA) model extends a pre-trained vision language model using an action decoder. The World Action Model (WAM) is built on a video diffusion transformer that learns physical priors. Generalist’s GEN-1 takes a different approach, training a native, reified underlying model from scratch based on real-world physical interaction data.

Spatial intelligence models are also essential and can help reconstruct and reason about the 3D structure of the physical environment, a gap that VLA and WAM currently have. Integrating these approaches aims to create applicable models of physical behavior.

Architecture for embodied action

Translating physical understanding into reliable action requires an architecture that maps intentions to motor commands, remains consistent over time, and operates within real-time constraints. Dual-system hierarchical architectures that separate inference from real-time control are emerging as a standard design pattern.

Action generation is rapidly evolving, with flow matching and diffusion-based methods generating smoother and more frequent sequential actions. An important advance is extending reinforcement learning to pre-trained VLAs, allowing them to improve the underlying model through autonomous practice and self-correction.

Simulation and synthetic data

The data challenges in physical AI are immense. Simulation and synthetic data generation are important infrastructure components that overcome the costs and limitations of real-world data collection. This modern simulation stack combines a physics engine, photorealistic rendering, and a world-based model.

Improvements in simulation are changing the economics of physical AI, allowing it to scale with computing rather than human labor. This infrastructure is also important for autonomous science and new interfaces.

Expansion of sensory manifold

The physical world provides richer signals than just vision and language. Touch, nerve signals, and muscle activity below the vocal cords provide important data. Expanding AI’s sensory access to these modalities will be driven by new devices and software infrastructure to capture and process these signals.

Source link