-
NVIDIA Cosmos 3 is a new leaderboard-topping open physical AI foundation model built on a breakthrough Transformer mixture architecture for physical AI inference, world simulation, and action generation.
-
Cosmos 3 is the world’s first fully open omni-model with native vision inference and multimodal generation across text, images, video, ambient sounds, and actions, enabling state-of-the-art synthetic data generation and physical AI policy model development.
-
NVIDIA is launching the NVIDIA Cosmos Coalition with leading AI labs and robotics leaders including Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI to advance the next generation of open world models.
NVIDIA announced NVIDIA Cosmos™ 3, an open-world foundational model for physical AI built on a breakthrough Transformer mixture architecture that combines vision inference, world generation, and action prediction in one system.
Cosmos 3 is the world’s first fully open omni-model that natively understands and generates text, images, video, environmental sounds, and actions with superior physics accuracy, reducing physical AI training and evaluation cycles from months to days.
NVIDIA also launched the NVIDIA Cosmos Coalition. This is a global collaboration where world model builders and AI developers, including Agile Robotics, Black Forest Labs, Generalist, LTX, Runway, and Skild AI, come together to power the next generation of world models.
Also read: AIThority interview with Rohit Agarwal, Founder and CEO of Portkey
“Thanks to breakthrough advances in multimodal reasoning languages, vision, and world models, the big bang of physical AI is just around the corner,” said Jensen Huang, founder and CEO of NVIDIA. “The Cosmos 3 family of open, frontier, omni-models gives developers a generational leap in the ability to build robots, autonomous vehicles, and vision AI that perceive, reason, plan, and act in the physical world.”
A new architecture for physical AI
Cosmos 3 addresses a fundamental challenge in physical AI: making a robot, autonomous vehicle (AV), or vision agent generalizable in the real world with limited training data and a fragmented simulation stack.
The model’s mixed transformer architecture combines inference transformers and expert generation transformers, allowing Cosmos 3 to understand object interactions, movements, and spatiotemporal relationships before generating video and action trajectories.
Trained on one of the largest multimodal physical AI datasets containing billions of samples across text, images, videos, sounds, and action trajectories, this model provides developers with a powerful pre-trained foundation to build physical AI systems with less data and lower training costs.
Developers can use Cosmos 3 to:
- a visual language model It understands and reasons across modalities.
- World model or video foundation model Simulate physical environments and predict future world states for training and evaluation.
- backbone of world action model It helps train robots to perform specific tasks.
Cosmos 3 models deliver excellent results on physical AI benchmarks. Among open models, it ranks first in Artificial Analysis, Physics-IQ, PAI-Bench, and R-Bench for world generation accuracy, RoboLab and RoboArena for action policy, and VANTAGE-Bench and TAR leaderboard for vision understanding.
The Cosmos 3 lineup provides developers with options for different stages of physical AI development.
- cosmos 3 super Ideal for post-training robotics and AV models that require the highest physical accuracy and production quality.
- cosmos 3 nano Achieve high-quality video and action inference in fractions of seconds.
- cosmos 3 edgecoming soon for real-time inference at the edge.
Cosmos Coalition accelerates development of open world models
The Cosmos Coalition is a global collaboration among global model builders, AI developers, and physical AI leaders to advance open-world models across industries, allowing members to contribute models, research, and evaluation technology while using Cosmos 3 technology, training tools, and NVIDIA DGX™ cloud infrastructure for training at scale.
Founding coalition members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI. By building openly and contributing across a shared ecosystem, the coalition aims to enable faster innovation, broader interoperability, and more rapid advancement in physical AI.
Developers build on Cosmos
The Cosmos platform powers NVIDIA’s physical AI stack to accelerate training and assessment workflows across industries. The platform includes new datasets for robotics, physics, human motion, autonomous driving, warehouse safety, and spatial reasoning, as well as new physical AI agent skills for neural scene reconstruction, defect image generation, and video augmentation.
Physical AI developers are building on the Cosmos platform across a variety of industries. Agile Robots, Doosan Robotics, LG Electronics, Samsung and Skild AI in robotics, Li Auto in AV, and Centific, Fogsphere, Linker Vision, Milestone Systems, and Yuan in vision AI agents powering industrial AI and smart space applications.
Also read: AI-powered risk intelligence: How financial institutions are anticipating systemic shocks
[To share your insights with us, please write to psen@itechseries.com]
