H100, L4, and Orin raise the bar for MLPerf inference

Applications of AI

As an independent third-party benchmark, MLPerf continues to be the definitive measure of AI performance. NVIDIA’s AI platform has consistently shown leadership in both training and inference since MLPerf’s inception, including the MLPerf Inference 3.0 benchmark released today.

“When we launched the A100 three years ago, the world of AI was dominated by computer vision. Generative AI was here.

“This is exactly why we built Hopper specifically optimized for GPT with the Transformer Engine. Today’s MLPerf 3.0 highlights a hopper that offers 4x the performance of A100.

“The next level of generative AI requires a new AI infrastructure to train language models at scale with great energy efficiency. We are building an AI infrastructure with 10,000 Hopper GPUs.

“The industry is working hard on new advances in safe and trustworthy generative AI. Hopper is enabling this important work,” he said.

The latest MLPerf results show how NVIDIA is harnessing AI inference to deliver new levels of performance and efficiency from the cloud to the edge.

In particular, the NVIDIA H100 Tensor Core GPU running on the DGX H100 system provided the best performance in all tests for AI inference, the task of running neural networks in production. Thanks to software optimizations, the GPU delivered up to a 54% performance boost from his September debut.

In healthcare, the H100 GPU delivered a 31% performance improvement since September on 3D-UNet, the MLPerf benchmark for medical imaging.

H100 GPU AI Inference Performance on MLPerf Workload

The H100 GPU with Transformer Engine was based on the Hopper architecture and excelled at BERT, a Transformer-based large-scale language model that paved the way for the widespread use of generative AI today.

Generative AI allows users to quickly create text, images, 3D models, and more. It’s a capability that companies, from start-ups to cloud service providers, are rapidly adopting to enable new business models and accelerate existing ones.

Hundreds of millions of people are now using generative AI tools like ChatGPT (also a conversion model) and expect immediate responses.

Inference performance is essential in this iPhone AI moment. Deep learning is now deployed almost everywhere, driving an insatiable need for inference performance, from factory floors to online recommendation systems.

L4 GPUs Are Faster Now

NVIDIA L4 Tensor Core GPUs debuted in MLPerf tests at over 3x the speed of previous generation T4 GPUs. Packaged in a thin form factor, these accelerators are designed for high throughput and low latency in virtually any server.

L4 GPU ran all MLPerf workloads. Their results were particularly surprising for the performance-hungry BERT model, thanks to his primary FP8 format support.

NVIDIA L4 GPU AI Inference Performance on MLPerf Workload

In addition to superior AI performance, L4 GPUs deliver up to 10x faster image decoding, up to 3.2x faster video processing, and over 4x faster graphics and real-time rendering performance.

Announced at GTC two weeks ago, these accelerators are already available from leading system manufacturers and cloud service providers. The L4 GPU is the latest addition to his portfolio of NVIDIA AI inference platforms announced at GTC.

Software and networks that shine in system tests

NVIDIA’s full-stack AI platform demonstrated its leadership in the new MLPerf test.

So-called network split benchmarks stream data to remote inference servers. This reflects a common scenario where an enterprise user runs his AI jobs in the cloud and stores data behind the corporate firewall.

On BERT, the remote NVIDIA DGX A100 system achieved up to 96% of maximum local performance, but was slowed down by having to wait for the CPU to complete some tasks. We achieved 100% in the ResNet-50 test for computer vision processed exclusively on GPUs.

Both results are due in large part to software such as the NVIDIA Quantum Infiniband network, NVIDIA ConnectX SmartNIC, and NVIDIA GPUDirect.

Orin shows 3.2x gain on edge

Separately, the NVIDIA Jetson AGX Orin system-on-module delivers up to 63% improvement in energy efficiency and 81% improvement in performance compared to results from a year ago. Jetson AGX Orin provides inference when low power levels of AI are required in confined spaces, such as battery-powered systems.

Jetson AGX Orin AI Inference Performance on MLPerf Benchmark

For applications that require a smaller module with lower power consumption, the Jetson Orin NX 16G shines with its benchmark debut. Delivering up to 3.2x the performance of his Jetson Xavier NX processor of the previous generation.

Extensive NVIDIA AI Ecosystem

MLPerf results show that NVIDIA AI is powered by the industry’s broadest machine learning ecosystem.

Ten companies submitted results for the NVIDIA platform in this round. They were provided by Microsoft Azure cloud service and system manufacturers such as ASUS, Dell Technologies, GIGABYTE, H3C, Lenovo, Nettrix, Supermicro and xFusion.

Their research shows that users can get great performance with NVIDIA AI both in the cloud and on servers running in their own data centers.

NVIDIA partners participate in MLPerf because they know it is a valuable tool for customers evaluating AI platforms and vendors. The latest round results show that the performance they currently offer will grow with his NVIDIA platform.

Users want versatile performance

NVIDIA AI is the only platform to run all MLPerf inference workloads and scenarios in data center and edge computing. Its versatile performance and efficiency make users a true winner.

Real-world applications typically employ many different types of neural networks and often need to provide answers in real time.

For example, an AI application might need to understand a user’s voice requests, classify images, make recommendations, and deliver responses as voice messages in a human voice. Each step requires a different type of AI model.

MLPerf benchmarks cover these and other common AI workloads. As such, this testing provides IT decision makers with reliable and flexible deployment performance.

Testing is transparent and objective, allowing users to make informed purchasing decisions based on MLPerf results. Benchmarks are endorsed by a wide range of groups including Arm, Baidu, Facebook AI, Google, Harvard, Intel, Microsoft, Stanford, and the University of Toronto.

Software that can be used

NVIDIA AI Enterprise, the software layer of the NVIDIA AI platform, not only enables users to get optimized performance from their infrastructure investments, it also provides the enterprise power needed to run AI in enterprise data centers. Ensure grade support, security, and reliability.

All the software used for these tests is available from the MLPerf repository, so anyone can get these world-class results.

Optimizations are continuously built into the containers available in NGC, NVIDIA’s catalog of GPU-accelerated software. This catalog hosts NVIDIA TensorRT, which is used in all submissions of this round to optimize AI inference.

Read this technical blog to dive deeper into the optimizations that drive the performance and efficiency of NVIDIA’s MLPerf.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *