Intel’s Habana Labs Plays Key Role as Generative AI Proliferates

Intel acquired AI chip maker Habana Labs just four years ago. The division now serves as “Intel’s de facto center of excellence in AI solutions,” according to Eitan Medina, his COO at Habana. This is a big role for a young group of triple-A companies.in an interview with HPC wireMedina claimed Habana Labs and its Gaudi chip for AI as Intel’s most competitive offering in the increasingly high-profile generative AI space.

Gaudi2 mezzanine card. Image courtesy of Intel.

First, a quick catch-up. Habana Labs announced its second-generation AI training chip, Gaudi2, almost exactly one year ago. That upgrade took Gaudi from 16nm (1st generation) to 7nm (both his TSMC). Gaudi2 has 96 GB of in-package HBM2e memory and 24 Tensor processor cores. From the beginning, Habana and Intel have touted the Gaudi chip’s superiority in low-power, high-speed AI training and inference compared to its competitors. Last I heard, Gaudi3 is on track for next year.

changing tone

“If you’ve been to an Intel conference recently, you’ll notice a change in tone, right?” Medina said. “Intel is actively promoting Gaudi-based architectures for purpose-built deep learning solutions,” Medina quotes a recent presentation at the World Economic Forum by Intel CEO Pat Gelsinger, where Gelsinger himself demonstrated on his Gaudi2 system.

As Medina outlined, Habana’s pitch within Intel was simple. His Intel CPU-based node for general computing. Intel’s GPU-based nodes for general acceleration such as dual-use HPC and AI. And “If you really want to do AI, the Habana solution is your solution.”

“High-end Intel GPUs aren’t really available yet,” says Medina. “So from Intel’s point of view, especially in large language models, [LLMs] Here, the only solution Intel recommends for training these large language models, and even for inferring them, is Gaudi2. [solutions]Medina nodded to Intel’s entry-level GPUs and its Xeon CPU line for inference tasks and model fine-tuning.

A breakdown of how Intel sees core use cases for various accelerators. Image courtesy of Habana Labs.

The Dawn of Generative AI Benchmarks

In March, Habana Labs and Hugging Face benchmarked Gaudi2 for inference on the 176 billion parameter BLOOMZ LLM, comparing it to a server based on Nvidia’s A100 GPU (80GB variant). Result: Inference on Gaudi2 is 1.3x faster, and Habana predicts that multiplier will be 1.8x on FP8 instead of BF16 (Habana Labs next quarter he plans to enable FP8 on Gaudi2) . Additionally, Gaudi2 used 22% less power in the process. A similar multiplier typically varies between 1.5x and 2.5x for both training and inference, depending on the model in question, and persists for various other benchmarks (e.g. Stable Diffusion). In some cases, Habana is waiting for Nvidia to provide benchmarks before offering a direct comparison.

Gaudi2 and Nvidia’s A100 benchmark for BLOOMZ (17.6 billion parameters) LLM inference. Image courtesy of Habana Labs.

In particular, Medina emphasized the difference in power consumption. “If you talk to end customers, they’ll tell you that their biggest problem is that cities aren’t adding megawatts for their data centers,” he said. Between increased estimated throughput and improved power efficiency, Medina says it’s “very easy to see where the value proposition lies” for customers.

Many (but not all) of Habana Labs’ most prominent deployments are in testbeds, where companies and research labs are working hard to understand the strengths and weaknesses of the exploding AI accelerator landscape. Mobileye, an Intel company, has deployed Gaudi2 into production, applying the chip to training custom object detection models to operate autonomous vehicles.

After Havana

“Gaudi3 is coming soon,” says Medina. “It’s actually in production. It will be our TSMC 5nm product.” , he said, will significantly improve performance, along with improved power efficiency.

Of course, Nvidia and AMD are also launching next-generation solutions, aimed squarely at AI applications. When asked how Gaudi2 and Gaudi3 are expected to stack up against them, Medina was optimistic. [of H100 performance] If we apply that to other models, we believe that Gaudi2 is competitive in terms of price performance rather than absolute performance, and that Gaudi3 greatly exceeds it. And of course, it’s also great value for money. (Medina attributes much of this to his “big leap” to 5nm.)

Medina expects Gaudi to remain the “preferred solution” for heavy AI workloads “at least for the next few years,” but after that the roadmap becomes more opaque, with the integration of Habana Labs chips and accelerators. is planned. Data center oriented, he works for the Accelerated Computing Systems and Graphics (AXG) group (in charge of the Max series “Ponte Vecchio” GPUs).

“Now Ponte Vecchio is focused on Argonne National Laboratory, right? Details for HPC use cases,” said Medina. “Intel knows that if that server only needs to run AI (heavy load), it will be Gaudi2 and soon after it will be Gaudi3. Now in the next 4th generation , which combines features of Gaudi and some of those of AXG.”

“We are already working on next-generation accelerator designs,” added Medina. “Intel really consolidates the roadmap between Habana and his AXG side of the organization. We are working on deeper integration.”

So far, that’s all you get with the 4th generation Habana products. Medina says he expects more news about Gaudi3 in the coming quarters.

tag:
AI, BLOOMZ, Gaudi, Gaudi2, Gaudi3, Habana Labs, Hugging Face, Intel, LLM, Nvidia, Stable Diffusion

Source link