Meta Unveils New AI Data Center and Supercomputer to Drive AI-First Future

Join C-suite executives in San Francisco July 11-12 to hear how leaders are integrating and optimizing their AI investments for success.. learn more

Formerly known as Facebook, social media giant Meta has pioneered artificial intelligence (AI) for more than a decade, using AI to power products and services such as news feeds, Facebook ads, messengers, and virtual reality. used for strengthening. But as demand for more advanced and scalable AI solutions grows, so does the need for more innovative and efficient AI infrastructure.

At today’s AI Infra @ Scale event, a one-day virtual conference hosted by Meta’s engineering and infrastructure teams, the company unveiled a new suite of hardware and software aimed at supporting the next generation of AI applications. announced the project. The event featured speakers from Meta to share their insights and experiences in building and deploying large-scale AI systems.

Among the announcements was a new AI data center design optimized for both the two main phases of AI model development and execution: AI training and inference. The new data center will leverage Meta’s proprietary silicon, the Meta Training and Inference Accelerator (MTIA). This is a chip that helps accelerate AI workloads across domains such as computer vision, natural language matrices, and recommendation systems.

Meta has also already built the Research Supercluster (RSC), an AI supercomputer that integrates 16,000 GPUs to help train large-scale language models (LLMs) like the LLaMA project Meta announced at the end of February. I also made it clear that there is

event

transform 2023

Join us July 11-12 in San Francisco. There, he shares how management integrated and optimized his AI investments to drive success and avoid common pitfalls.

Meta CEO Mark Zuckerberg said in a statement: “We’ve spent years building advanced infrastructure for AI, and this effort reflects a long-term commitment. We will continue to advance and use this technology more effectively in all of our activities.”

Building AI infrastructure is a key challenge in 2023

Meta isn’t the only hyperscaler or major IT vendor considering purpose-built AI infrastructure. In November, Microsoft and his Nvidia announced a partnership on an AI supercomputer in the cloud. The system (unsurprisingly) benefits from his Nvidia GPU coupled with Nvidia’s Quantum 2 InfiniBand networking technology.

A few months later, in February, IBM announced details of an AI supercomputer codenamed Vela. IBM’s systems use x86 silicon with Nvidia GPUs and Ethernet-based networking. Each node of the Vela system has eight of his 80GB A100 GPUs. IBM’s goal is to build a new underlying model that can serve his AI needs in the enterprise.

Not to be outdone, Google entered the AI supercomputer race with an announcement on May 10th. Our systems use Nvidia GPUs and custom-designed Infrastructure Processing Units (IPUs) to enable fast data flow.

Meta is also now entering the custom silicon space with its MTIA chip. Custom-made AI inference chips are nothing new either. Google has been building its Tensor Processing Unit (TPU) for several years, and Amazon has its own AWS inference chip since 2018.

For Meta, the need for AI inference spans multiple aspects of social media site operations, including news feeds, rankings, content understanding, and recommendations. In a video outlining MTIA silicon, Meta infrastructure research scientist Amin Firoozshahian said conventional CPUs weren’t designed to handle the inference demands from the applications his Meta runs. I commented. That’s why the company decided to build its own custom silicon.

“MTIA is a chip optimized for the workloads we care about and tailored specifically to those needs,” said Firoozshahian.

Meta is also a big user of its own open-source machine learning (ML) framework, PyTorch. Since 2022, PyTorch will be under the governance of his PyTorch Foundation effort at The Linux Foundation. One of MTIA’s goals is to have highly optimized silicon ready to run Meta’s large-scale PyTorch workloads.

MTIA Silicon can achieve up to 102.4 TOPS (Trillion Operations per Second) in a 7nm (nanometer) process design. MTIA is part of a highly integrated approach within Meta to optimize AI operations such as networking, data center optimization, and power utilization.

The data center of the future is built for AI

Meta has been building its own data centers for over a decade to meet the needs of billions of users. So far so good, but with the demand for AI exploding, it’s time to do more.

Rachel Peterson, vice president of data center strategy at Meta, said during a roundtable discussion at the Infra@scale event, “Our current generation data center designs are world-class, energy and power efficient. ‘ said. “In fact, it really supports us through multiple generations of servers, storage, and networking, and can handle current AI workloads very well.”

As the use of AI increases across the meta, more computing power will be required. Peterson pointed out that Meta sees a future where AI chips are expected to consume more than five times the power of Meta’s typical CPU servers. This expectation has led Meta to rethink data center cooling, offering liquid cooling to its chips to achieve the right level of power efficiency. Achieving the right cooling and power to enable AI is the driving force behind Meta’s new data center design.

“As we look to the future, it is always important to plan for the future of AI hardware and systems and how we can equip our fleet with the highest performing systems,” says Peterson. says Mr.

VentureBeat Mission will be the digital town square for technical decision makers to gain knowledge and transact on transformative enterprise technologies. Watch the briefing.

Source link