Nvidia’s AI supercomputer gets backing from Microsoft and Google

Nvidia took aggressive steps this week to strengthen its position in the AI supercomputing market, launching several systems through the end of the year to help developers and users create and deploy AI-based applications faster. outlined plans to provide

The highlight of the announcement is a large memory Nvidia DGX system using the company’s new GH200 Grace Hopper superchip, which is tightly integrated with Nvidia’s NVLink switch system. The new system is purpose-built to create next-generation models for generative AI language applications, recommender systems, and data analytics workloads.

Combining NVLink interconnect technology with NVLink switches, the system can link up to 256 GH200 superchips and make them function as a single GPU. This allows the system to deliver 1 exaflops of performance and up to 144 terabytes of shared memory, or about 500 times more memory than his previous generation DGX A100 GPU.

Jensen Huang

In his keynote address at the Computex 2023 conference in Taiwan over the weekend, Nvidia CEO Jensen Huang said that generative AI, large-scale language models and recommender systems are the “digital engines of the modern economy,” and that the DGX GH200 and other Machines “expand the AI frontier” with added speed and network capabilities.

“The problem facing most supercomputers is more of a computational limitation that comes with limited bandwidth,” said Jack Gold, president and principal analyst at J. Gold Associates LLC. , or chip-to-memory communication can slow things down.” Therefore, anything you can do to increase the bandwidth between all these connections can have a huge impact on your system’s performance. ”

Nvidia said Google, Meta and Microsoft will be the first to access the GH200 to explore potential new features, primarily for generative AI workloads. AWS has launched the DGX GH200 through the Nvidia MGX server specification, a modular reference architecture that helps other manufacturers and cloud providers build up to 100 server variations to support a variety of AI-based high-performance computing and omniverse applications. provide the design of

The software bundled with the system includes AI workflow management, enterprise-class cluster management, numerous libraries that help accelerate compute, storage, and network infrastructure, and is tuned to run AI-based workloads. includes Nvidia Base Command, which provides additional system software. Also his Nvidia AI Enterprise, a software layer that provides developers and users with 100 frameworks, pre-trained models and a variety of development tools designed to simplify the deployment of AI applications into production. is also included.

Inherently adding a starter model can be a big problem for many companies that don’t have the money to build their own model. [AI] Create a model from scratch. Training large AI models can take months for some stores, and the associated costs can be significant.

jack goldJ. Gold Associates Principal Analyst