NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

Today, we’re excited to announce availability at Day Zero. NVIDIA Nemotron 3 Ultra With Amazon SageMaker JumpStart.

With this release, you can now deploy Nemotron 3 Ultra models using a one-click deployment experience. Nemotron 3 Ultra is an open model built for frontier inference and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% cost savings for agent workloads. Nemotron 3 Ultra is optimized for the NVFP4 format, making it significantly faster and more cost-effective to host the model.

NVIDIA Nemotron 3 Ultra Overview

NVIDIA Nemotron 3 Ultra is an open large-scale language model with 550 billion total parameters and 55 billion active parameters. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture and is designed to deliver frontier intelligence at a fraction of the computational cost of dense models of comparable quality.

specification	detail
architecture	Hybrid Transformer – Mamba MoE
parameters	Total 550B / Active 55B
context length	Up to 1 million tokens
input/output	Text input, text output
accuracy	NVFP4
Inference speed	Long-running agent workflows are now 5x faster
Fee	Up to 30% reduction for complex agent tasks

Why agent AI needs a dedicated model

Agents don’t respond just once. They make plans, invoke tools, delegate work to subagents, review results, and repeat hundreds of turns. Every step adds tokens and compute, so the key metrics are task completion with useful accuracy, time to completion, and cost per task.

Nemotron 3 Ultra directly addresses this. Its MoE architecture activates only 55B of 550B parameters per forward pass, maintaining high throughput even at context lengths of 1 million tokens. This means agents can plan, call tools, and maintain a self-correcting loop spanning hundreds of turns while helping maintain consistency and control costs.

Enterprise use case

Nemotron 3 Ultra excels in workloads that require sustained, multi-step inference.

agent orchestrator – Coordinate multiple subagents and manage state across long tool call chains
coding agent – Generate, test, debug, and iterate code across large repositories
deep research – Integrate information from multiple sources and maintain consistent inferences across extended contexts
Complex enterprise workflows – Automate multi-step business processes with decision branching and error recovery.

Try using SageMaker JumpStar

You can deploy Nemotron 3 Ultra with one click via Amazon SageMaker JumpStart, eliminating the need to manage infrastructure or configure a serving framework.

Prerequisites

Before you begin, make sure you have the following:

AWS account
Appropriate scope of permissions for SageMaker JumpStart
Sufficient service quota for GPU instances (such as ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)

important: Deploying this model creates a SageMaker endpoint that incurs charges while running. GPU instances like ml.p5en.48xlarge can cost several dollars per hour. For more information, see Amazon SageMaker AI Pricing. Be sure to delete the endpoint when you’re finished to avoid ongoing charges.

Deploy using SageMaker Studio

Open Amazon SageMaker Studio
In the left navigation pane, select SageMaker JumpStart
Search for Nemotron 3 Ultra
Please select a model card
Select deployment
Select an instance type (supported instance types are ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
Check your deployment settings (defaults are sufficient for most use cases)
Select Deploy to create the endpoint
Wait until the endpoint status shows InService before proceeding with inference.

Deploy using the SageMaker Python SDK

import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(
    model_id="huggingface-reasoning-nvidia-nemotron-3-ultra-550b-a55b-nvfp4",  # Verify in SageMaker JumpStart model card
    role=sagemaker.get_execution_role(),  # Your SageMaker execution role ARN
)
predictor = model.deploy(accept_eula=True)

perform inference

payload = {
    "messages": [{
        "role": "user",
        "content": "Break this task into subtasks, identify which tools are needed, and run them in sequence."
    }],
    "max_tokens": 20480,
    "temperature": 0.6,
    "top_p": 0.95,
}
response = predictor.predict(payload)
print(response["choices"][0]["message"]["content"])

cleaning

Delete the SageMaker endpoint when you’re done to avoid unnecessary charges.predictor.delete_endpoint()

conclusion

NVIDIA Nemotron 3 Ultra brings frontier-class inference to Amazon SageMaker JumpStart, making inference 5x faster and reducing costs for agent workloads by up to 30%. With a hybrid Transformer-Mamba MoE architecture and a 1 million token context window, it is purpose-built for the persistent multi-step inference required by production agents.

Whether you’re building an agent orchestrator, coding agent, deep research system, or complex enterprise automation, you can deploy Nemotron 3 Ultra today with SageMaker JumpStart.

Search for Nemotron 3 Ultra on Amazon SageMaker JumpStart to get started today.

About the author

Dan Ferguson I’m an AWS solutions architect based in New York, USA. Dan is a machine learning services expert dedicated to helping customers integrate ML workflows efficiently, effectively, and sustainably.

Malaf Shastri He is a software development engineer at AWS and is part of the Amazon SageMaker JumpStart and Amazon Bedrock teams. His role focuses on enabling customers to take advantage of cutting-edge open source and proprietary foundational models. Malav holds a master’s degree in computer science.

Vivek Gangasani World leader in solution architecture, SageMaker Inference. He leads SageMaker Inference’s solution architecture, technical go-to-market (GTM), and outbound product strategy. We also help enterprises and startups deploy and optimize GenAI models and build AI workflows using SageMaker and GPUs. Currently, he focuses on developing strategies and content to optimize inference performance and use cases such as agentic workflows and RAGs. In my free time, I enjoy hiking, watching movies, and sampling different cuisines.