Google headquarters in Mountain View, California, USA on September 26, 2022.
Taifun Koskun | Anadolu Agency | Getty Images
Google released details about one of its artificial intelligence supercomputers on Wednesday, saying it will be faster and more efficient than competing Nvidia systems. That’s because power-hungry machine learning models continue to be the hottest part of the tech industry.
Google has been designing and deploying AI chips called Tensor Processing Units (TPUs) since 2016, while Nvidia dominates the AI model training and deployment market with over 90%.
Google is a major AI pioneer, and its employees have made some of the most significant advances in the field over the past decade. However, some believe it is lagging behind when it comes to commercializing its inventions, and internally, the company is racing to release products and prove it hasn’t wasted leads, and the company’s “code CNBC previously reported that it was a “red” situation.
AI models and products such as Google’s Bard and OpenAI’s ChatGPT, powered by Nvidia’s A100 chip, require many computers and hundreds or thousands of chips working together to train the model. operates around the clock for weeks or months.
On Tuesday, Google announced that it has built a system with over 4,000 TPUs coupled with custom components designed for running and training AI models. It has been running since 2020 and was used to train Google’s PaLM model, which competes with OpenAI’s GPT model, over 50 days.
Called TPU v4, Google’s TPU-based supercomputer is “1.2x to 1.7x faster than the Nvidia A100 and consumes 1.3x to 1.9x less power,” Google researchers wrote.
“The performance, scalability, and availability make the TPU v4 supercomputer a workhorse for large-scale language models,” the researchers continued.
However, Google’s TPU results didn’t compare to the latest Nvidia AI chip, the H100. Because the H100 is newer and made with more advanced manufacturing technology.
An Nvidia spokeswoman declined to comment. The results and rankings of an industry-wide AI chip test called MLperf are expected to be released on Wednesday.
The significant amount of computer power required for AI is expensive, and much of the industry is focused on developing new chips, components such as optical connections, or software techniques that reduce the amount of computer power required.
AI power requirements are also a boon for cloud providers such as Google, Microsoft, and Amazon. Cloud providers can rent computing power by the hour and offer credits or compute hours to startups to build relationships. (Google’s cloud also sells time on his Nvidia chips.) For example, Google said his Midjourney, an AI image generator, is trained on its own TPU chips.