NVIDIA Announces Open Models of Nemotron 3 Family

News summary:

The Nemotron 3 family of open models (Nano, Super, and Ultra sizes) introduces the most efficient family of open models with superior accuracy for building agent AI applications.

Nemotron 3 Nano delivers 4x higher throughput than Nemotron 2 Nano and delivers the most tokens per second for large-scale multi-agent systems through a breakthrough hybrid expert mixture architecture.

Nemotron achieves superior accuracy through advanced reinforcement learning techniques with massively simultaneous multi-environment post-training.

NVIDIA was the first company to release a collection of state-of-the-art open models, training datasets, and reinforcement learning environments and libraries for building highly accurate, efficient, and specialized AI agents.

NVIDIA today announced the NVIDIA Nemotron™ 3 family of open models, data, and libraries designed to drive transparent, efficient, and specialized agent AI development across industries.

Nemotron 3 models (available in Nano, Super, and Ultra sizes) offer a breakthrough. Hybrid Potential Mix of Experts (MoE) An architecture that helps developers build and deploy reliable multi-agent systems at scale.

As organizations move from single-model chatbots to collaborative, multi-agent AI systems, developers face increasing challenges such as communication overhead, context drift, and high inference costs. Additionally, developers need transparency to trust models that automate complex workflows. Nemotron 3 directly addresses these challenges, providing the performance and openness customers need to build specialized agent AI.

“Open innovation is the foundation of AI progress,” said NVIDIA Founder and CEO Jensen Huang. “With Nemotron, we are transforming advanced AI into an open platform, giving developers the transparency and efficiency they need to build agent systems at scale.”

NVIDIA Nemotron works with the following organizations to support NVIDIA's broader sovereign AI efforts: Europe to South Korea By adopting an open, transparent, and efficient model, you can build AI systems that align with your data, regulations, and values.

Early adopters such as Accenture, Cadence, CrowdStrike, Cursor, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens, Synopsys, and Zoom are integrating the Nemotron family of models to power AI workflows across manufacturing, cybersecurity, software development, media, communications, and other industries.

“NVIDIA and ServiceNow have been shaping the future of AI for years, but the best is yet to come,” said Bill McDermott, Chairman and CEO of ServiceNow. The combination will continue to define the standard with unparalleled efficiency, speed and precision.”

As multi-agent AI systems expand, developers increasingly rely on proprietary models for cutting-edge inference while using more efficient and customizable open models to reduce costs. Routing tasks between frontier-level models and Nemotron within a single workflow provides maximum intelligence to agents while optimizing. tokenomics.

“Perplexity is built on the idea that human curiosity is amplified by precision AI built into great tools like AI assistants,” said Aravind Srinivas, CEO of Perplexity. “Our Agent Router allows you to direct workloads to best-tuned open models such as Nemotron 3 Ultra, or leverage key proprietary models when your tasks leverage unique capabilities, ensuring our AI assistants operate at great speed, efficiency, and scale.”

The open Nemotron 3 model allows startups to build and iterate faster on AI agents, accelerating innovation from prototype to enterprise deployment. General Catalyst and Mayfield portfolio companies are considering Nemotron 3 to build AI teammates to support human-AI collaboration.

“NVIDIA's Open Model Stack and the NVIDIA Inception program provide early-stage companies with the models, tools, and cost-effective infrastructure to rapidly experiment, differentiate, and scale,” said Navin Chaddha, managing partner at Mayfield. “Nemotron 3 gives founders a jumpstart on building agent AI applications and AI teammates, and helps them take advantage of NVIDIA's large installed base.”

Nemotron 3 reinvents multi-agent AI with efficiency and precision
Nemotron 3 family MoE model Includes 3 sizes:

Nemotron 3 Nano is a compact 30 billion parameter model that enables up to 3 billion parameters at once for targeted, high efficiency tasks.

Nemotron 3 Super is a high-precision inference model for multi-agent applications with approximately 100 billion parameters and up to 10 billion actives per token.

Nemotron 3 Ultra is a large-scale inference engine with approximately 500 billion parameters and up to 50 billion actives per token for complex AI applications.

Currently available, the Nemotron 3 Nano is the most compute-cost-efficient model, optimizing tasks such as software debugging, content summarization, AI assistant workflows, and information retrieval at low inference costs. This model uses a unique hybrid MoE architecture to achieve increased efficiency and scalability.

This design delivers up to 4x higher token throughput compared to Nemotron 2 Nano and reduces inference token generation by up to 60%, significantly reducing inference costs. Featuring a 1 million token context window, the Nemotron 3 Nano remembers more, becoming more accurate and better able to connect information across long, multi-step tasks.

Artificial Analysis, an independent AI benchmarking organization, ranked this model as the most open, efficient, and highly accurate of its size.

Nemotron 3 Super excels in applications that require many cooperating agents to perform complex tasks with low latency. Nemotron 3 Ultra serves as an advanced inference engine for AI workflows that require deep exploration and strategic planning.

Nemotron 3 Super and Ultra use NVIDIA's ultra-efficient 4-bit NVFP4 training format on the NVIDIA Blackwell architecture, significantly reducing memory requirements and accelerating training. This efficiency allows you to train large models on existing infrastructure without sacrificing accuracy compared to higher precision formats.

The Nemotron 3 family of models allows developers to choose the right-sized open model for their specific workloads, scaling from tens to hundreds of agents while reaping the benefits of faster, more accurate long-term inference for complex workflows.

New open tools and data for AI agent customization
NVIDIA also released a collection of training datasets and cutting-edge reinforcement learning libraries available to anyone building professional AI agents.

New Nemotron 3 Trillion Tokens Pre-training, after training and reinforcement learning The dataset provides examples of the rich reasoning, coding, and multi-step workflows required to create highly functional, domain-specific agents. of Nemotron Drug Safety Dataset provides real-world telemetry that helps teams assess and strengthen the safety of complex agent systems.

To accelerate development, NVIDIA nimo gym and Nemo RL The open-source library provides the training environment and post-training foundation for Nemotron models, along with NeMo Evaluator to verify model safety and performance. All tools and datasets are now available on GitHub and Hugging Face.

Nemotron 3 is supported by: LM studiollama.cpp, SGLang, and vLLM. Additionally, Prime Intellect and sloth We integrate NeMo Gym's ready-to-use training environment directly into your workflow, giving your team quick and easy access to powerful reinforcement learning training.

Try using the NVIDIA open model
Nemotron 3 Nano is available starting today hug face Baseten, DeepInfra, fireworkFriendliAI, OpenRouter, Together AI.

Nemotron is delivered on enterprise AI and data infrastructure platforms such as Couchbase, DataRobot, H2O.ai, JFrog, Lambda, and UiPath. For customers using public clouds, Nemotron 3 Nano is available on AWS via Amazon Bedrock (serverless), as well as Google Cloud, CoreWeave, Crusoe, Microsoft Foundry, NeviusN scale and Yotta coming soon.

Nemotron 3 Nano is NVIDIA NIM™ Microservices Deploy securely and scalably anywhere on your NVIDIA-accelerated infrastructure for maximum privacy and control.

Nemotron 3 Super and Ultra are expected to be available in the first half of 2026.

Source link