NVIDIA Nemotron 3 Nano 30B MoE model now available on Amazon SageMaker JumpStart

Machine Learning


Today, we are excited to announce that the NVIDIA Nemotron 3 Nano 30B model with 3B active parameters is now generally available in the Amazon SageMaker JumpStart Model Catalog. Nemotron 3 Nano on Amazon Web Services (AWS) lets you accelerate innovation and deliver tangible business value without managing complex model deployments. SageMaker JumpStart provides managed deployment capabilities that you can use to power your generated AI applications with Nemotron capabilities.

Nemotron 3 Nano is a small-language hybrid mixed-expert (MoE) model with the highest computational efficiency and accuracy, allowing developers to drive highly skilled agent tasks at scale. The model is fully open with open weights, datasets, and recipes, allowing developers to seamlessly customize, optimize, and deploy the model to their infrastructure to meet privacy and security requirements. Nemotron 3 Nano excels in coding and reasoning, and excels in benchmarks such as SWE Bench Verified, GPQA Diamond, AIME 2025, Arena Hard v2, and IFBench.

About Nemotron 3 Nano 30B

Nemotron 3 Nano is differentiated from other models by its architecture and precision, boasting strong performance with a variety of advanced technical skills.

  • Architecture:
    • ο MoE with hybrid Transformer-Mamba architecture ο Supports token budget to provide optimal accuracy with minimal inference token generation
  • Accuracy:
    • Superior accuracy for coding, scientific reasoning, mathematics, and following instructions
    • Leads in benchmarks like LiveCodeBench, GPQA Diamond, AIME 2025, BFCL, IFBench (compared to other open language models under 30B)
  • Ease of use:
    • 30B parameter model with 3 billion active parameters
    • Has a context window of up to 1 million tokens
    • Text-based basic model. Use text for both input and output.

Prerequisites

To start using Nemotron 3 Nano with Amazon SageMaker JumpStart, you need a provisioned Amazon SageMaker Studio domain.

Try using NVIDIA Nemotron 3 Nano 30B with SageMaker JumpStart

To test the Nemotron 3 Nano model with SageMaker JumpStart, open and select SageMaker Studio model in the navigation pane. Search for “NVIDIA” in the search bar and select it NVIDIA Nemotron 3 Nano 30B As a model.

SageMaker AI JumpStart search results

On the model details page, expand Follow the prompts to deploy the model.

Once your model is deployed to your SageMaker AI endpoint, you can test it. You can access the model using the following AWS Command Line Interface (AWS CLI) code example. can be used nvidia/nemotron-3-nano as a model ID.

cat > input.json << EOF
{
"model": "${MODEL_ID}",
"messages": [
{
 	"role": "system",
 	"content": "You are a helpful assistant."
 },
 {
 	"role": "user",
       	"content": "What is NVIDIA? Answer in 2-3 sentences."
}],
"max_tokens": 512,
"temperature": 0.2,
"stream": False, # Set to False for non-streaming mode,
   	"chat_template_kwargs": {"enable_thinking": False} # Set to False for non-reasoning mode
}
EOF
 
aws sagemaker-runtime invoke-endpoint \
--endpoint-name ${ENDPOINT_NAME} \
--region ${AWS_REGION} \
--content-type 'application/json' \
--body fileb://input.json \
> response.json

Alternatively, you can access the model using the SageMaker SDK and Boto3 code. The following Python code example shows how to send a text message to an NVIDIA Nemotron 3 Nano 30B using the SageMaker SDK. For additional code examples, see the NVIDIA GitHub repository.

runtime_client = boto3.client('sagemaker-runtime', region_name=region) 
payload = {
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "max_tokens": 1000
    }
    
    try:
        response = self.runtime_client.invoke_endpoint(
            EndpointName=self.endpoint_name,
            ContentType="application/json",
            Body=json.dumps(payload)
        )
        
        response_body = response['Body'].read().decode('utf-8')
        raw_response = json.loads(response_body)
        
        # Parse the response using our custom parser
        return self.parse_response(raw_response)
        
    except Exception as e:
        raise Exception(
            f"Failed to invoke endpoint '{self.endpoint_name}': {str(e)}. "
            f"Check that the endpoint is InService and you have least-privileged IAM permissions assigned."
        )

currently available

NVIDIA Nemotron 3 Nano is now fully managed and available with SageMaker JumpStart. See the model package for available AWS Regions. For more information, please visit the Nemotron Nano model page, the NVIDIA GitHub sample notebook for Nemotron 3 Nano 30B, and the Amazon SageMaker JumpStart pricing page.

Try the Nemotron 3 Nano model on Amazon SageMaker JumpStart today and send feedback to AWS re:Post on SageMaker JumpStart or through your regular AWS support contact.


About the author

Dan Ferguson I’m an AWS solutions architect based in New York, USA. Dan is a machine learning services expert dedicated to helping customers integrate ML workflows efficiently, effectively, and sustainably.

Pooja Karaj He leads product and strategic partnerships for Amazon SageMaker JumpStart, the machine learning and generative AI hub within SageMaker. She is focused on accelerating customers’ AI adoption by simplifying the discovery and deployment of underlying models, enabling them to build production-ready, generative AI applications across the model lifecycle, from onboarding to customization to deployment.

benjamin crabtree He is a senior software engineer on the Amazon SageMaker AI team, specializing in delivering “last mile” experiences to customers. He is passionate about democratizing the latest artificial intelligence breakthroughs by providing easy-to-use features. Ben also has extensive experience building large-scale machine learning infrastructure.

timothy ma He is a lead specialist in generative AI at AWS, working with customers to design and deploy cutting-edge machine learning solutions. He also leads the go-to-market strategy for generative AI services, helping organizations harness the potential of advanced AI technologies.

Abdullahi Olaoye He is a Senior AI Solutions Architect at NVIDIA, specializing in integrating NVIDIA AI libraries, frameworks, and products with cloud AI services and open source tools to optimize AI model deployment, inference, and generation AI workflows. He works with AWS to enhance the performance of AI workloads and drive adoption of NVIDIA-powered AI and generative AI solutions.

Nirmal Kumar Jullu He is a product marketing manager at NVIDIA, driving the adoption of AI software, models, and APIs in the NVIDIA NGC catalog and NVIDIA AI Foundation models and endpoints. He previously worked as a software developer. Nirmal holds an MBA from Carnegie Mellon University and a Bachelor’s degree in Computer Science from BITS Pilani.

vivian chen As a Deep Learning Solutions Architect at NVIDIA, I help teams bridge the gap between complex AI research and real-world performance. Vivian specializes in inference optimization and cloud-integrated AI solutions, with a focus on turning the heavy lifting of machine learning into fast, scalable applications. She is passionate about helping clients navigate NVIDIA’s accelerated computing stack to ensure their models not only work in the lab, but also in production.



Source link