AWS and NVIDIA Collaborate to Build Generative AI Applications

Amazon Web Services (AWS) is the world’s most scalable, on-demand artificial intelligence (AI) infrastructure optimized for training increasingly complex large language models (LLMs) and developing generative AI applications. Announced a multi-part collaboration focused on building structure.

The collaboration will leverage next-generation Amazon Elastic Compute Cloud (Amazon EC2) P5 instances powered by NVIDIA H100 Tensor Core GPUs and AWS’s most advanced networking and networking to deliver up to 20 exaflops of compute performance for building and training. It has scalability. Largest deep learning model. The P5 instances will be the first GPU-based instances to take advantage of AWS’ second-generation Elastic Fabric Adapter (EFA) network. This provides a low-latency, high-bandwidth network throughput of 3,200 Gbps, allowing the customer to scale up to his 20,000 H100 GPUs. EC2 UltraClusters for on-demand access to AI supercomputer-class performance.

new supercomputing cluster

The new P5 instances build on a decade-long collaboration between AWS and NVIDIA to provide AI and HPC infrastructure, and build on four previous collaborations across P2, P3, P3dn, and P4d(e) instances. Based on P5 instances are the fifth generation of his AWS offering powered by NVIDIA GPUs, starting with CG1 instances, almost 13 years after the first deployment of NVIDIA GPUs. P5 instances train and infer the increasingly complex LLM and computer vision models behind the most demanding and compute-intensive generative AI applications such as question answering, code generation, video and image generation, and speech recognition. is ideal for performing

Purpose-built for both enterprises and startups looking to bring their AI-powered innovations to market in a scalable and secure manner, P5 instances deliver 16 petaflops of mixed precision performance, 640 GB of high-bandwidth memory, and 3,200 NVIDIA H100 GPUs. Gbps network connectivity on a single EC2 instance (8x more than previous generation). Improved performance of P5 instances reduces training time for machine learning (ML) models by up to 6x (reducing training time from days to hours), and additional GPU memory enables customers to scale You can train complex models. P5 instances are expected to reduce the cost of training ML models by up to 40% over the previous generation, providing customers with greater efficiency than inflexible cloud services or expensive on-premises systems To do.

Amazon EC2 P5 instances are deployed in a hyperscale cluster called EC2 UltraCluster. This cluster consists of the highest performing compute, network, and storage in the cloud. Each EC2 UltraCluster is one of the world’s most powerful supercomputers, enabling customers to run the most complex multi-node ML training and distributed HPC workloads. The new EC2 P5 instances will enable customers such as Anthropic, Cohere, Hugging Face, Pinterest, Stability AI to build and train the largest ML models at scale. Collaboration with additional generations of EC2 instances helps startups, enterprises, and researchers scale seamlessly to meet their ML needs.

New server design for scalable and efficient AI

For the H100’s release, NVIDIA and AWS engineering teams with thermal, electrical, and mechanical expertise have joined forces to bring AI to life with GPUs, with a focus on energy efficiency for AWS infrastructure. We’ve designed a server that delivers at scale. GPUs are typically 20x more energy efficient than CPUs for certain AI workloads, and H100s are up to 300x more efficient for LLM than CPUs.

Building on AWS and NVIDIA’s efforts focused on server optimization, the two companies have begun collaborating on future server designs to increase scaling efficiency through next-generation system design, cooling technology, and network scalability. rice field.

Source link