Researchers at the University of Michigan have developed an innovative optimization framework that can dramatically reduce the energy demands of training deep learning models, a key tool for powering artificial intelligence systems.
An open-source optimization framework called Zeus studies deep learning models during the training process to identify the optimal balance between energy consumption and training speed.

Credit: SymbioticLab, University of Michigan
Deep learning models have surged in popularity in recent years, powering a wide variety of applications, from image generation models and expressive chatbots to recommendation systems for platforms like TikTok and Amazon. However, the energy consumption associated with training these models is substantial and has significant environmental impacts.
“At extreme scale, a single training session of the GPT-3 model consumes 1,287 MWh, enough to power an average US household for 120 years,” said Professor Mosharaf Chowdhury. says.
Researcher’s goal
Using the new Zeus energy optimization framework, Chowdhury and his team were able to reduce such energy consumption without the need for new hardware and with only a small impact on the time it takes to train a model. We aim to reduce by up to 75%. The framework was presented at his 2023 USENIX symposium on Network Systems Design and Implementation (NSDI) in Boston.
As cloud computing continues to grow and outstrip commercial aviation emissions, the increased climate impact of artificial intelligence is a pressing concern.
“Existing research is primarily focused on optimizing deep learning training to complete faster without considering its impact on energy efficiency,” said the study’s first author, Ph.D. said PhD student Jae-Won Chung.
“We’ve found that the energy gains we’re putting into our GPUs are dwindling. This allows us to significantly reduce our energy consumption without comparably slowing us down.”
Why AI training consumes a lot of energy
Deep learning is a subset of machine learning that relies on multilayered artificial neural networks, also known as deep neural networks (DNNs), to tackle a variety of tasks. These models are highly complex and learn from the largest data sets ever used in machine learning.
As a result, it benefits greatly from the multitasking capabilities of the graphics processing unit (GPU), which accounts for 70% of the power consumed during the training process.
Zeus achieves this optimization by adjusting two key software parameters in real time. GPU power limits and deep learning model batch sizes. GPU Power Limit controls GPU energy consumption and reduces GPU energy consumption while temporarily slowing down model training until settings are adjusted again.
The batch size parameter, on the other hand, determines how many samples from the training data the model processes before updating its internal representation. Larger batch sizes reduce training time but increase energy consumption.
Zeus’ unique ability to adjust these settings in real-time allows you to find the optimal trade-off point between energy usage and training time. According to Jie You, a recent Ph.D. in computer science and engineering and co-first author of the study, the iterative nature of machine learning allows Zeus to learn the behavior of his DNN over different iterations. It’s actually very effective because you can.
Advantages of Zeus over other options
Zeus stands out as the first framework designed to seamlessly integrate into existing workflows for a variety of machine learning tasks and GPUs. This innovative solution reduces energy consumption without requiring changes to system hardware or data center infrastructure.
To further reduce the carbon footprint of DNN training, the team also developed complementary software called Chase. The software favors speed when low-carbon energy is available, choosing efficiency at the expense of peak speed, which is likely to involve carbon-intensive energy generation like coal.
Chase won second place at last year’s CarbonHack hackathon and will be announced at the International Conference on Learning Representations Workshop on May 4th.
Study co-author Zhenning Yang emphasized the need for a solution that does not compete with practical constraints on DNN training, such as data regulations and modern data requirements.
“Our aim is to reduce the carbon footprint of DNN training while designing and implementing a solution that does not violate these practical constraints,” said Yang.
Zeus is a breakthrough step towards addressing the energy efficiency issues associated with training deep learning models. By optimizing the trade-off between energy consumption and training speed, this framework has the potential to significantly reduce the environmental impact of artificial intelligence systems without sacrificing performance.
This work was supported in part by National Science Foundation grants CNS-1909067 and CNS-2104243, VMWare and Kwanjeong Educational Foundation, and computing credits provided by CloudLab and Chameleon Cloud. As the AI industry continues to grow, frameworks like Zeus and software like Chase can play a key role in promoting sustainability and minimizing the environmental impact of AI technology. can.
—-
Check out EarthSnap, a free app from Eric Ralls and Earth.com.
