Meta AI Introduces MTIA v1: First Generation AI Inference Accelerator

https://ai.facebook.com/blog/meta-training-inference-accelerator-AI-MTIA/

Screenshot 2023-05-21 at 11.35.51 PM — https://ai.facebook.com/blog/meta-training-inference-accelerator-AI-MTIA/

At Meta, AI workloads are everywhere, serving as the foundation for many applications such as content understanding, feeds, generative AI, and ad ranking. Thanks to its seamless Python integration, eager-mode programming, and straightforward API, PyTorch can run these workloads. In particular, DLRM is essential to improving the user experience across all Meta products and services. Hardware systems must provide more and more memory and compute without sacrificing efficiency as these models grow in size and complexity.

GPUs aren’t always the best option for efficiently processing Meta’s own recommendation workloads at scale. To address this problem, the Meta team developed a series of application-specific integrated circuits (ASICs) called the “Meta Training and Inference Accelerator” (MTIA). With the needs of the next generation of recommendation models in mind, his 1st generation ASIC is built into his PyTorch to develop a fully optimized ranking system. Keeping developers productive is an ongoing process to maintain support for PyTorch 2.0, which dramatically improves PyTorch’s compiler-level performance.

In 2020, the team created the original MTIA ASIC to serve Meta’s internal processing needs. Co-designed with silicon, PyTorch, and recommended models, this inference accelerator is part of a full-stack solution. Using TSMC 7nm technology, this 800 MHz accelerator can achieve 102.4 TOPS with INT8 precision and 51.2 TFLOPS with FP16 precision. The TDP (Thermal Design Power) of the device is 25W.

🚀 Check out 100’s of AI Tools at the AI Tools Club

Accelerators can be divided into constituent parts such as processing elements (PEs), on-chip and off-chip memory resources, and grid structure interconnects. A separate control subsystem within the accelerator manages the software. Firmware coordinates the execution of jobs on the accelerator, controls available computing and memory resources, and communicates with the host through specific host interfaces. The off-chip DRAM in the memory subsystem uses LPDDR5 and is expandable to 128 GB. The chip’s 128MB of on-chip SRAM is shared by all PEs, making more bandwidth and much lower latency available for frequently accessed data and instructions.

The 64 PEs in the grid are arranged in an 8×8 matrix. 128 KB of local SRAM memory in each PE enables fast data storage and processing. A mesh network links PEs to each other and to memory banks. You can use the entire grid to run your job, or divide it into a number of sub-grids, each handling its own work. Matrix multiplication, accumulation, data transfer, and nonlinear function computation are just some of the important tasks optimized by multiple fixed-function units and two processor cores in each PE. RISC-V ISA-based processor cores are heavily modified to perform the necessary computation and control operations. This architecture is designed to take full advantage of two essential ingredients for effective workload management: parallelism and data reuse.

Researchers compared MTIA to NNPI accelerators and graphics processing units. The results show that MTIA relies on efficiently managing small forms and batch sizes for low-complexity models. MTIA aggressively optimizes the SW stack to achieve similar levels of performance. Meanwhile, it runs medium and high complexity models using a heavily optimized larger form on his SW stack on the GPU.

To optimize the performance of Meta’s workloads, the team is now trying to find the right middle ground between computing power, memory capacity and interconnect bandwidth to develop better and more efficient solutions. I am concentrating on

Please check plan.don’t forget to join 21,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email us. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI Tools Club

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data her science enthusiast and has a keen interest in the range of applications of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its practical applications.

➡️ Introducing Bright Data: The World’s #1 Web Data Platform

Source link