Inside Meta’s scramble to keep up with AI

As the summer of 2022 draws to a close, Meta CEO Mark Zuckerberg announced that, according to a company memo dated September, the company’s computing power will be focused on its ability to do cutting-edge artificial intelligence work. gathered executives for a five-hour analysis of 20 reviewed by Reuters.

They had a thorny issue. Despite high-profile investments in AI research, the social media giant has been slow to adopt expensive AI-friendly hardware and software systems for its core business. It is increasingly relying on AI to support growth, according to memos, company statements, and interviews with 12 people familiar with the changes.

A memo written by Santosh Janardhan, the new head of infrastructure, posted on Meta’s internal message board said, “There are huge gaps in tools, workflows and processes when it comes to developing for AI. We need to invest,” he said. It occurred in September and is now being reported for the first time.

Supporting the work of AI will require Meta to “fundamentally change its approach to delivering physical infrastructure designs, software systems and stable platforms,” Meta added. .

For over a year, Meta has been working on a large-scale project to flesh out the AI infrastructure. The company has publicly admitted that it is “a little behind” on AI hardware trends, but details of the overhaul (capacity reduction, management change, AI chip project closure, etc.) have been reported so far. It has not been.

Asked about the memo and restructuring, Meta spokesman Jon Carvill said the company has “a track record of creating and deploying cutting-edge infrastructure at scale, combined with deep AI research and engineering expertise.” said.

Carvill said: “By delivering new AI-powered experiences for our family of apps and consumer products, we are confident that we will be able to continue expanding the capabilities of our infrastructure to meet our short-term and long-term needs. He declined to comment on whether Meta has abandoned its AI chip.

Janardhan and other executives declined requests for interviews made through the company.

The overhaul surged Meta’s capital expenditures by about $4 billion per quarter (nearly doubled as of 2021) and suspended construction of previously planned data centers at four locations, according to the company’s disclosures. or cancel.

These investments coincide with a period of severe financial pressure for Meta, which has laid off workers since November on a scale not seen since the dotcom bankruptcy.

Meanwhile, Microsoft-backed OpenAI’s ChatGPT has burgeoned to become the fastest-growing consumer application in history after its debut on November 30, a technology for releasing products using so-called generative AI. It sparked an arms race between major players. Other AIs create human-like written and visual content in response to prompts.

Generative AI will devour massive amounts of computing power, amplifying the urgency of Meta’s capacity war, five sources say.

fall behind

According to these five sources, the main cause of the problem can be traced back to Meta’s belated adoption of a graphics processing unit (GPU) for AI work.

GPU chips are well suited for processing artificial intelligence because they can perform many tasks simultaneously. This reduces the time required to process billions of data.

However, GPUs are also more expensive than other chips, with chipmaker Nvidia Corp controlling 80% of the market and maintaining a commanding lead in accompanying software, sources said.

Nvidia did not respond to a request for comment for this article.

Instead, until last year, Meta primarily used its fleet of commodity central processing units (CPUs) to run AI workloads. CPUs have been the computing world’s workhorse chips, filling data centers for decades, but they’ve done AI poorly.

The company has also started using its own custom chip designed in-house for inference, according to two of these sources. This is an AI process where algorithms trained on vast amounts of data make decisions and generate responses to prompts.

By 2021, the two approaches would prove slower and less efficient than approaches built around GPUs, which are more flexible to run different kinds of models than Meta’s chips,2 person official said.

Meta declined to comment on the AI chip’s performance.

As Zuckerberg pivoted the company toward the Metaverse—a series of digital worlds enabled by augmented and virtual reality—that strained capacity led to the rise of social media rival TikTok and Apple-led advertising. The ability to deploy AI to respond to threats such as privacy was declining. That will change, said four of his sources.

The stumbling block caught the attention of former Meta board member Peter Thiel, who stepped down in early 2022, with no explanation.

At a board meeting before he resigned, Steele told Zuckerberg and executives that while he was too focused on the Metaverse, he was happy with Meta’s core social media business. exchange.

Mehta declined to comment on the conversation.

catch up

After a major rollout of Meta’s own custom inference chip, scheduled for 2022, was canceled, according to one source, management reversed course and announced billions of dollars’ worth of Nvidia that year. I ordered a GPU.

Mehta declined to comment on the order.

By then, Meta was already a few steps behind competitors like Google, which began rolling out its own custom-built version of the GPU, dubbed TPU, in 2015.

That spring, management embarked on a reorganization of Meta’s AI division, appointing two new engineering heads, including Janardhan, the author of the September memo.

More than a dozen executives left Meta during the months of upheaval, according to LinkedIn profiles and sources familiar with the departures.

Meta is next in data centers to accommodate the upcoming GPUs, which consume more power than CPUs, generate more heat, and need to be tightly clustered with specialized networks between them. started rebuilding.

According to Janardhan’s memo and four sources familiar with the project, the facility will be “completely redesigned,” requiring 24 to 32 times the network capacity and a new liquid cooling system to manage cluster heat. I had a need. Disclosed.

As work progressed, Meta made in-house plans to begin developing a new, more ambitious in-house chip capable of both training AI models and performing inference, like a GPU. The previously unreported project is expected to be completed around 2025, two sources said.

Meta spokesman Carvill said construction of the data center, which was paused during the transition to the new design, will resume later this year. He declined to comment on the chip project.

trade off

Meta has expanded its GPU capacity as competitors such as Microsoft and Google push commercial generative AI products to the general public, but so far it has paid off little.

Chief Financial Officer Susan Li acknowledged in February that Meta isn’t spending much of its current computing on generative work, saying, “Basically, all of our AI power is in advertising, feeds, and reels. “It’s aimed at younger users,” he said, a short video format like the popular TikTok.

According to four sources, Meta didn’t prioritize building a generative AI product until ChatGPT launched in November. The company’s lab, FAIR, or Facebook AI Research, has been rolling out prototypes of the technology since late 2021, but the company hasn’t focused on translating its well-received research into products, they said. said.

That’s changing as investor interest grows. Zuckerberg said he announced a new top-level generative AI team in February to “turbocharge” the company’s work in this area.

Similarly, Chief Technology Officer Andrew Bosworth also said earlier this month that the area where he and Zuckerberg have spent the most time is generative AI, and that Meta expects to release a product this year.

Two people familiar with the new team said the work is in early stages and is focused on building a foundational model. This is a core program that can later be tweaked and adapted to different products.

Meta spokesperson Carvill said the company has been building generative AI products with different teams for more than a year. He confirmed that work has accelerated in the months since ChatGPT came out.

(Edited by Kenneth Li and Claudia Parsons)

Reuters

Source link