Compute Computational Efficiency of Deep Learning Models Using FLOP and MAC

FLOPs (floating point operations) and MACs (multiply-accumulate operations) are commonly used metrics to calculate the computational complexity of deep learning models. They are a quick and easy way to understand how many arithmetic operations are required to perform a particular calculation. For example, when using different model architectures such as MobileNet and DenseNet for edge devices, use MAC or FLOP to estimate model performance. Also, the reason I use the word “estimated” is that both metrics are approximations rather than actual captures of runtime performance models. However, it can provide very useful insight into energy consumption and computational requirements, which is very useful in edge computing.

Figure 1: Comparison of various neural networks using FLOPs from “Densely Connected Convolutional Networks”

FLOP specifically refers to the number of floating-point operations such as addition, subtraction, multiplication, and division on floating-point numbers. These operations are widely used in many mathematical computations related to machine learning, such as matrix multiplication, activation, and gradient computation. FLOPs are often used to measure the computational cost or complexity of a model or a particular operation within a model. This is useful when you need to estimate the total number of arithmetic operations required. It is commonly used in the context of measuring computational efficiency.

A MAC, on the other hand, only counts the number of multiply-accumulate operations that multiply two numbers and add the results. This operation is the basis of many linear algebra operations, such as matrix multiplication, convolution, and dot product. MAC is often used as a more specific measure of computational complexity for models that rely heavily on linear algebra operations, such as convolutional neural networks (CNNs).

It’s worth mentioning here that it’s not a single factor calculating FLOPs to figure out computational efficiency. Many other factors are considered necessary when estimating model efficiency. For example, the degree of parallelism in the system setup. What are the architectural models (e.g. group convolution cost for MAC)? The computing platform that the model uses (for example, Cudnn has GPU acceleration for deep neural networks, and standard operations like forward and normalization are highly tuned).

Are FLOPS and FLOP the same?

FLOPS in all caps is an abbreviation for “floating point operations per second”, which refers to computational speed and is commonly used as a measure of hardware performance. The ‘S’ in ‘FLOPS’ stands for ‘seconds’ and is commonly used together with ‘P’ (for ‘per’) to represent rate.

FLOP (lowercase ‘s’ for plural), on the other hand, refers to floating-point operations. It is typically used to calculate the computational complexity of an algorithm or model. However, in AI discussions, FLOP may have both meanings above, and it’s up to the reader to identify which exactly it stands for. There are also arguments for abandoning the use of “FLOP” altogether and using “FLOP” instead to make it easier to distinguish between them. This article continues to use FLOPs.

Relationship between FLOPs and MACs

Figure 2: Relationship between GMAC and GLOP (source)

As mentioned in the previous section, the main differences between FLOP and MAC include the kind of arithmetic operations FLOP counts and the context in which FLOP is used. As per the GitHub comments in Figure 2, the general consensus in the AI community is that one MAC is roughly equivalent to two FLOPs. For deep neural networks, MAC is considered more important because multiply-accumulate operations are computationally expensive.

The good news is that there are already several open-source packages dedicated to calculating FLOPs, so you don’t have to implement it from scratch. Some of the most popular include flops-counter.pytorch and pytorch-OpCounter. There are also packages like torchstat that provide users with a popular network analyzer based on PyTorch. Also note that these packages have limited layer and model support. Therefore, if you are running a model that consists of customized network layers, you may need to calculate FLOPs yourself.

Here is an example code that uses pytorch-OpCounter and torchvision’s pre-trained alexnet to compute FLOPs.

from torchvision.models import alexnet
from thop import profile

model = alexnet()
input = torch.randn(1, 3, 224, 224)
macs, params = profile(model, inputs=(input, ))

This article introduced the definition of FLOP and MAC, when they are commonly used, and the difference between the two attributes.

Danny Lee Currently an AI resident on Meta. She is interested in building efficient AI systems and her current research focus is on-device ML models. She also strongly believes in leveraging open source collaboration and community support to maximize innovation potential.

Source link