All AI is on the GPU. It's changing rapidly.

For years, Nvidia's rise has been synonymous with one idea: GPUs are the engines of artificial intelligence. They drove a training boom that turned large-scale language models from an academic curiosity into a multitrillion-dollar ambition. But Nvidia's $20 billion deal with Groq is an acknowledgment that the next stage of AI won't be won by GPUs alone.

Groq makes a completely different type of AI chip called an LPU (Language Processing Unit). To understand why Nvidia spent so much money and why it didn't just build this technology itself, we need to look at where AI workloads are headed. The industry is moving from training models to running them in the real world. That change has a name: inference.

Inference is what happens after a model is trained, when it answers questions, generates images, or talks to users. This is becoming a major task for AI computing, and could dwarf the training market in the future, according to recent estimates compiled by analysts at RBC Capital.

Graph on AI Training and Inference Market Outlook

Structure Research/RBC Capital Markets

This is important because inference has completely different needs than training. Training is like building a brain, requiring a large amount of raw computing power and flexibility. Reasoning is like using that brain in real time. Speed, consistency, power efficiency, and cost per answer suddenly become far more important than brute force attacks.

That's where Groq comes in. Founded by former Google engineers, Groq built its business around dedicated chips for inference. Its LPU is designed more like a precision assembly line than a general-purpose factory. All operations are planned in advance, performed in a fixed order, and repeated perfectly every time. This flexibility is a weakness for training, but a strength for inference, where predictability leads to lower latency and less wasted energy.

In contrast, Nvidia's graphics processing units (GPUs) are designed to be flexible. It relies on a scheduler and a large pool of external memory to handle many types of workloads. This flexibility is why GPUs win in the training market, but it also creates overhead that slows down inference. As AI products mature and stabilize, the trade-offs become harder to justify.

Tony Fadell, the creator of the iPod and an investor in Groq, recently wrote on LinkedIn that “the tectonic plates of the semiconductor industry have shifted again.” “GPUs decisively won training in the first wave of AI data centers. But inference is always going to be a real volume game, and GPUs are not optimized for inference by design.”

Fadell calls this new type of AI chip an “IPU,” or inference processing unit.

Explosion of various chips

Analysts at TD Cowen noted this week that Nvidia's adoption of an entirely new architecture, rather than just a dedicated inference chip, shows how big and mature the inference market is.

Previous AI infrastructure investments were driven by purchasing decisions that prioritized training. Analysts added that while Nvidia's GPUs once had the advantage, as the saying goes, “Today's training chip is tomorrow's inference engine,” that is no longer the case.

Instead, there will be an explosion of different chips in future AI data centers, said Chris Ratner, an industry visionary who helped develop the software for Google's TPU AI chips, which was co-designed by Groq founder Jonathan Ross.

This move beyond GPUs is being driven by two trends, reinforced by Nvidia's Groq deal, Lattner told me this week.

“First, 'AI' is not a single workload. There are many different inference and training workloads,” he said. “Second, hardware specialization leads to significant efficiency gains.”

“Humble behavior”

In a 2024 article (a very old one), Business Insider warned readers that inference could be vulnerable to Nvidia as rivals look to fill this strategic gap. Cerebras built a massive AI chip that was optimized for speed and delivered thousands of times more memory bandwidth than Nvidia's flagship GPUs at the time. Google's TPUs are designed to efficiently run bespoke AI workloads at incredible speeds. Amazon has developed its own inference chip, Inferentia. Startups like Positron AI claimed they could beat or match Nvidia's inference performance at a fraction of the cost.

Therefore, Nvidia's deal with Groq can be seen as a pre-emptive move. Rather than let its inference experts chip away at its edge, Nvidia chose to adopt a fundamentally different architecture.

Fadel described the partnership as a “humble move” by NVIDIA CEO Jensen Huang. “Many companies miss these inflection points because of their 'this wasn't invented here' ego,” Fadel added. “That's not the case with Jensen. He recognized the threat and used it to his advantage.”

economics of reasoning

Economics is persuasive. Inference is what makes AI products profitable. Now is the time to prove whether the hundreds of billions of dollars spent on data centers will pay off. As AWS CEO Matt Garman has said, if inference doesn't become mainstream in 2024, “all these big model investments won't really pay off.”

Importantly, NVIDIA is not betting on one winner. GPUs continue to handle training and flexible workloads. Specialized chips like Groq handle fast, real-time inference. Nvidia's advantage is that it owns the connective tissue: the software, networking, and the developer ecosystem that makes these components work together.

“AI data centers are becoming hybrid environments with GPUs and custom ASICs running in parallel, each optimized for different workload types,” RBC analysts wrote in a recent note, referring to purpose-built integrated circuits such as Groq's LPUs.

Some competitors argue that the partnership proves GPUs are unsuitable for high-speed inference. Others see it as a validation of a more fragmented future, where different chips serve different needs. Nvidia's Huang is firmly in the second camp. By licensing Groq's technology and bringing its team into the tent, Nvidia will be able to offer customers both an AI shovel and an assembly line.

In fact, RBC Capital analysts noted that Nvidia has developed NVLink Fusion technology that can connect other custom chips directly to the GPU, strengthening the future of this mixed hardware.

“GPUs are incredible accelerators,” Cerebras CEO Andrew Feldman recently wrote. “They've taken us a long way in AI. They're not the right machines for high-speed inference. And there are other architectures that are better suited. And Nvidia just spent $20 billion to back this up.”

Sign up for BI's Tech Memo newsletter here. Please contact us by email. abarr@businessinsider.com.

Source link