AWS and NVIDIA deepen strategic collaboration to accelerate AI from pilot to production

AI is advancing rapidly, and for most customers, the real opportunity lies not in experimenting with AI, but in running it in production and delivering meaningful business outcomes. This means building systems that perform reliably, perform at scale, and meet your organization’s security and compliance requirements.

Today at NVIDIA GTC 2026, AWS and NVIDIA announced an expanded collaboration with new technology integrations to support growing AI computing demands and help you build and run production-ready AI solutions. These integrations span accelerated computing, interconnect technologies, and model fine-tuning and inference. They include:

Key announcements at NVIDIA GTC 2026

Expand your AI infrastructure with expanded GPU options and optimized interconnects

Accelerating computing power in the age of agent AI

Starting in 2026, AWS will add more than 1 million NVIDIA GPUs, including Blackwell and Rubin GPU architectures, across global cloud regions. AWS offers the broadest collection of NVIDIA GPU-based instances of any cloud provider to power a variety of AI/ML workloads. AWS and NVIDIA are also collaborating on Spectrum networking and other infrastructure areas, furthering the companies’ more than 15 years of joint innovation.

AWS’ advanced cloud and AI infrastructure provides enterprises, startups, and researchers with the infrastructure they need to build and scale agent AI systems that can reason, plan, and operate autonomously across complex workflows.

New Amazon EC2 instances with NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs

Today we announced that Amazon EC2 instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs are coming soon. AWS is the first major cloud provider to announce support for RTX PRO 4500 Blackwell Server Edition GPUs. These instances are suitable for a wide range of workloads including data analytics, conversational AI, content generation, recommender systems, video streaming, video rendering, and other graphics workloads.

Amazon EC2 instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs are built on the AWS Nitro System, a combination of purpose-built hardware and a lightweight hypervisor that provides the instance with virtually all the compute and memory resources of the host hardware, improving overall resource utilization and performance. The Nitro System’s specialized hardware, software, and firmware are designed to enforce restrictions that prevent anyone, including AWS employees, from accessing sensitive AI workloads and data. Additionally, Nitro System supports firmware updates, bug fixes, and optimizations while the system continues to operate. These features within the Nitro System provide enhanced resource efficiency, security, and stability needed for AI, analytics, and graphics workloads in production environments.

Distributed LLM Inference Interconnect Acceleration with NVIDIA NIXL on AWS EFA and Trainium

As your model size increases, communication overhead between GPUs or Trainium can become a bottleneck. Today, we announced support for the NVIDIA Inference Xfer Library (NIXL) by AWS EFA to accelerate distributed Large Language Model (LLM) inference across NVIDIA GPUs on Amazon EC2 and AWS Trainium. Granular inference acceleration is critical to scaling modern AI workloads because it enables efficient overlap of communication and computation while minimizing communication latency and maximizing GPU utilization. This integration enables high-throughput, low-latency movement of KV cache data between GPU compute nodes that perform token generation and distributed memory resources that store KV cache state. It also provides the flexibility to build inference clusters using any combination of GPUs and Trainium EFA-enabled EC2 instances. NIXL with EFA natively integrates with popular open source frameworks such as NVIDIA Dynamo, vLLM, and SGLang to improve token-to-token latency and improve KV cache memory usage efficiency.

Accelerate data analysis with Amazon EMR and NVIDIA GPUs

Run Apache Spark 3x faster with Amazon EMR on Amazon EKS with G7e instances

Data engineers and data scientists are often faced with multi-hour data processing pipelines that slow down AI/ML model iterations and business intelligence generation. The performance of these workloads has improved significantly. AWS and NVIDIA accelerate performance of Apache Spark workloads by 3x using Amazon EMR on EKS on G7e instances. This performance is the result of a joint engineering collaboration between AWS and NVIDIA that optimizes GPU-accelerated analytics by combining Amazon EMR on EKS with NVIDIA’s RTX PRO 6000 architecture. Amazon EMR and G7e instances enable data engineers and data scientists to accelerate time to insight for AI/ML feature engineering, complex ETL transformations, and real-time analytics at scale. Customers running large data processing pipelines can reduce the time required to perform analytics while maintaining full compatibility with existing Spark applications.

Expanded support for NVIDIA Nemotron models on Amazon Bedrock

Fine-tuning Nemotron models in Amazon Bedrock using Reinforcement Fine-Tuning (coming soon)

Developers will soon be able to fine-tune NVIDIA Nemotron models directly on Amazon Bedrock using Reinforcement Fine-Tuning (RFT). This is important for teams that need to tailor model behavior to a specific domain, such as legal, medical, financial, or other professional fields. Fine-tuning reinforcement allows you to shape not just what your model knows, but how it reasons and reacts. It also runs natively on Amazon Bedrock, so there is zero infrastructure overhead. Define your tasks, provide feedback signals, and Bedrock takes care of the rest. Learn about fine-tuning reinforcement in Amazon Bedrock.

Nemotron 3 Super (coming soon) on Amazon Bedrock

NVIDIA Nemotron 3 Super, a hybrid MoE model built for multi-agent workloads and scaled inference, is coming soon to Amazon Bedrock. It is designed to help AI agents maintain accuracy across complex multi-step workflows, powering use cases across financial cybersecurity, retail, and software development, and provides fast, cost-effective inference through fully managed APIs.

Improving energy efficiency and sustainability

As AI workloads scale, performance per watt becomes more than just a sustainability metric, it becomes a competitive advantage. In this NVIDIA GTC session, Amazon CSO Kara Hurst joins sustainability leaders from Equinix and PepsiCo to discuss how AI is transforming enterprise energy and infrastructure at scale, from the data center as an active grid participant to AI as an enterprise efficiency engine, and how AWS infrastructure is 4.1x more energy efficient than on-premises data centers and how AWS can help you achieve optimal energy efficiency. discuss how they can contribute.

Built to run together

What makes these announcements interesting is not a single feature, but their collective presentation. AWS and NVIDIA’s 15-year partnership has created a full stack of end-to-end optimized AI infrastructure, from GPU to network to managed services layer. No need to sew it yourself. Ready to run.

If you’re at GTC this week, stop by the AWS booth. Check out live demos, watch in-booth theater sessions, and get your customized swag at AWS Swag Factory.

Visit AWS at NVIDIA GTC 2026 to see everything AWS is doing at the conference.