Current AI agent systems juggle separate models for vision, speech, and language, losing time and context when passing data from one model to another.
The NVIDIA Nemotron 3 Nano Omni, announced today, is an open multimodal model that combines these capabilities into one system. Enable agents to provide faster, smarter responses using advanced reasoning across video, audio, images, and text. This best-in-class model provides enterprises and developers with a more efficient and accurate operational path for multimodal AI agents with complete deployment flexibility and control.
Nemotron 3 Nano Omni establishes a new frontier in open multimodal model efficiency with superior accuracy and low cost, topping six leaderboards in complex document intelligence and video and audio understanding.
AI and Software companies that have already adopted Nemotron 3 Nano Omni include: Able, Applied Scientific Intelligence (ASI), Ekacare, foxconn, Company H, Palantir, Piler, and Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle and Zephr Evaluating the model.
“To build useful agents, you can’t wait seconds for the model to interpret the screen.” Gauthier Croix, CEO of Company H, said: “Based on the Nemotron 3 Nano Omni, our agents will be able to quickly interpret Full HD screen recordings, something that was previously impractical. This is more than just a speed increase; it fundamentally changes the way our agents perceive and interact with their digital environments in real time.”
Nemotron 3 Nano Omni enables faster, leaner multimodal agents
Consider an AI agent in customer support who analyzes uploaded call audio and processes screen recordings while reviewing data logs, or a finance agent tasked with parsing PDFs, spreadsheets, charts, and voice notes. Currently, most agent systems use separate models for vision, speech, and language to perform these tasks.
This approach increases latency through repeated inference passes, fragments context across modalities, and increases cost and inaccuracy over time.
Hybrid vision and audio encoders can be combined in the 30B-A3B. mix of experts The Nemotron 3 Nano Omni architecture eliminates the need for separate perceptual models and improves inference efficiency at scale. This efficiency, combined with strong multimodal recognition accuracy, enables the AI system to achieve 9x higher throughput than other open omni models with the same interactivity. The result is lower costs and increased scalability without sacrificing responsiveness or quality.
For agent systems, Nemotron 3 Nano Omni can be used in conjunction with proprietary cloud models, other NVIDIA Nemotron open models (such as Nemotron 3 Super for high-frequency execution, Nemotron 3 Ultra for complex planning), and proprietary models from other providers to power subagents for agent workflows such as computer usage, document intelligence, and audio-video inference.
- Computer usage agent — Nemotron 3 Nano Omni powers the cognitive loop that allows agents to interact with graphical user interfaces, reason about on-screen content, and understand the state of the user interface over time. Company H’s latest work computer usage agentPowered by Nemotron 3 Nano Omni, the tool uses a native input resolution of 1920 x 1080 pixels for high-fidelity visual inference. In preliminary evaluation of the OSWorld benchmark, this integration significantly improves manipulation of complex graphical interfaces and leverages the Nemotron 3 Nano Omni’s ability to process very high-resolution images.
- document intelligence : Interpret documents, charts, tables, screenshots, and mixed media input, allowing agents to consistently infer visual structure and textual content. Important for enterprise analytics and compliance workflows.
- Understanding audio and video — In customer service, research, and monitoring workflows, Nemotron 3 Nano Omni maintains the context of audio and video, connecting what’s said, shown, and documented into a single inference stream instead of disparate summaries.

Open, customizable, and deployable anywhere
Nemotron 3 Nano Omni is released with open weights, datasets, and training techniques, giving organizations complete transparency and control over how their models are customized and deployed.
Developers can use tools such as NVIDIA NeMo Intended for customization, evaluation, and optimization for domain-specific use cases. The Nemotron family of models is open, allowing organizations to deploy them in environments that meet regulatory, sovereignty, and data localization requirements.
The Nemotron 3 family (including Nano, Super, and Ultra models) 50 million downloads in the past year. Omni extends the family’s capabilities into the multimodal and agentic domains.
The model is available below hug face, open router and build.nvidia.com NVIDIA NIM as microservices and through a broader ecosystem NVIDIA Cloud Partnerinference platform and cloud service providers.
Its open and lightweight architecture supports consistent deployment from local systems such as NVIDIA Jetson modules. NVIDIA DGX Spark and DGX Station For data centers and cloud environments.
Visit the NVIDIA Technology Blog. Tutorials, cookbooks, and implementation guides For Nemotron 3 Nano Omni use cases. SGet the latest information on Agent AI, NVIDIA Nemotron Subscribe to get more NVIDIA News, join the community Follow NVIDIA AI linkedin, Instagram, × and facebook.
explore Self-paced video tutorials and live streams.
