A new way to optimize GEMM-based applications targeting two leading AI-optimized FPGA architectures

A technical paper titled “An Efficient Approach for GEMM Acceleration in State-of-the-Art AI-Optimized FPGAs” has been published by researchers from the University of Texas at Austin and Arizona State University.

Abstract:

“FPGAs are a promising platform for accelerating deep learning (DL) applications due to their high performance, low power consumption, and reconfigurability.Recently, major FPGA vendors have We have enhanced the architecture to support it more efficiently. However, the two most prominent AI-optimized FPGAs, AMD/Xilinx Versal ACAP and Intel Stratix 10 NX, take significantly different architectural approaches. This paper introduces a new systematic framework for optimizing the performance of General Matrix Multiplication (GEMM), a fundamental operation for DL workloads, by leveraging the unique and distinct architectural characteristics of each FPGA. Evaluations on int8-precision GEMM workloads showed up to 77 and 68 TOPs (int8) throughput and energy efficiency of up to 0.94 and 1.35 TOPs/W on Versal VC1902 and Stratix 10 NX, respectively. provides insights and guidelines for optimizing GEMM-based applications on these platforms, while also delving into their programmability trade-offs and associated challenges.”

Please see the technical document here. Published April 2024 (preprint).

Taka, Endri, Dimitrios Golounas, Andreas Gerschlauer, Diana Marculescu, Aman Arora. “An efficient approach for his GEMM acceleration on state-of-the-art AI-optimized FPGAs” arXiv preprint arXiv:2404.11066 (2024).

Related books
AI accelerator architectures are undergoing major changes
As AI begins to move to the edge, design teams are racing to make it faster and more energy efficient.

Source link