Today, we’re excited to announce availability at Day Zero. NVIDIA Nemotron 3 Ultra With Amazon SageMaker JumpStart.
With this release, you can now deploy Nemotron 3 Ultra models using a one-click deployment experience. Nemotron 3 Ultra is an open model built for frontier inference and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% cost savings for agent workloads. Nemotron 3 Ultra is optimized for the NVFP4 format, making it significantly faster and more cost-effective to host the model.
NVIDIA Nemotron 3 Ultra Overview
NVIDIA Nemotron 3 Ultra is an open large-scale language model with 550 billion total parameters and 55 billion active parameters. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture and is designed to deliver frontier intelligence at a fraction of the computational cost of dense models of comparable quality.
| specification | detail |
|---|---|
| architecture | Hybrid Transformer – Mamba MoE |
| parameters | Total 550B / Active 55B |
| context length | Up to 1 million tokens |
| input/output | Text input, text output |
| accuracy | NVFP4 |
| Inference speed | Long-running agent workflows are now 5x faster |
| Fee | Up to 30% reduction for complex agent tasks |
Why agent AI needs a dedicated model
Agents don’t respond just once. They make plans, invoke tools, delegate work to subagents, review results, and repeat hundreds of turns. Every step adds tokens and compute, so the key metrics are task completion with useful accuracy, time to completion, and cost per task.
Nemotron 3 Ultra directly addresses this. Its MoE architecture activates only 55B of 550B parameters per forward pass, maintaining high throughput even at context lengths of 1 million tokens. This means agents can plan, call tools, and maintain a self-correcting loop spanning hundreds of turns while helping maintain consistency and control costs.
Enterprise use case
Nemotron 3 Ultra excels in workloads that require sustained, multi-step inference.
- agent orchestrator – Coordinate multiple subagents and manage state across long tool call chains
- coding agent – Generate, test, debug, and iterate code across large repositories
- deep research – Integrate information from multiple sources and maintain consistent inferences across extended contexts
- Complex enterprise workflows – Automate multi-step business processes with decision branching and error recovery.
Try using SageMaker JumpStar
You can deploy Nemotron 3 Ultra with one click via Amazon SageMaker JumpStart, eliminating the need to manage infrastructure or configure a serving framework.
Prerequisites
Before you begin, make sure you have the following:
- AWS account
- Appropriate scope of permissions for SageMaker JumpStart
- Sufficient service quota for GPU instances (such as ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
important: Deploying this model creates a SageMaker endpoint that incurs charges while running. GPU instances like ml.p5en.48xlarge can cost several dollars per hour. For more information, see Amazon SageMaker AI Pricing. Be sure to delete the endpoint when you’re finished to avoid ongoing charges.
Deploy using SageMaker Studio
- Open Amazon SageMaker Studio
- In the left navigation pane, select SageMaker JumpStart
- Search for Nemotron 3 Ultra
- Please select a model card
- Select deployment
- Select an instance type (supported instance types are ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
- Check your deployment settings (defaults are sufficient for most use cases)
- Select Deploy to create the endpoint
- Wait until the endpoint status shows InService before proceeding with inference.

Deploy using the SageMaker Python SDK
perform inference
cleaning
Delete the SageMaker endpoint when you’re done to avoid unnecessary charges.predictor.delete_endpoint()
conclusion
NVIDIA Nemotron 3 Ultra brings frontier-class inference to Amazon SageMaker JumpStart, making inference 5x faster and reducing costs for agent workloads by up to 30%. With a hybrid Transformer-Mamba MoE architecture and a 1 million token context window, it is purpose-built for the persistent multi-step inference required by production agents.
Whether you’re building an agent orchestrator, coding agent, deep research system, or complex enterprise automation, you can deploy Nemotron 3 Ultra today with SageMaker JumpStart.
Search for Nemotron 3 Ultra on Amazon SageMaker JumpStart to get started today.
About the author
Dan Ferguson I’m an AWS solutions architect based in New York, USA. Dan is a machine learning services expert dedicated to helping customers integrate ML workflows efficiently, effectively, and sustainably.
Malaf Shastri He is a software development engineer at AWS and is part of the Amazon SageMaker JumpStart and Amazon Bedrock teams. His role focuses on enabling customers to take advantage of cutting-edge open source and proprietary foundational models. Malav holds a master’s degree in computer science.
Vivek Gangasani World leader in solution architecture, SageMaker Inference. He leads SageMaker Inference’s solution architecture, technical go-to-market (GTM), and outbound product strategy. We also help enterprises and startups deploy and optimize GenAI models and build AI workflows using SageMaker and GPUs. Currently, he focuses on developing strategies and content to optimize inference performance and use cases such as agentic workflows and RAGs. In my free time, I enjoy hiking, watching movies, and sampling different cuisines.

