KAIST identifies “hidden energy costs” for AI agents for the first time

AI News


KAIST identifies “hidden energy costs” for AI agents for the first time

image:

Key characteristics of AI agents and their impact on infrastructure

view more

Credit: KAIST

As the era of AI agents (systems that can reason and act autonomously) begins, data center power consumption has emerged as a critical challenge. The KAIST research team analyzed the computational cost and energy consumption of AI agents for the first time and found that they can consume up to 136.5 times more energy per query than traditional generative AI. This research shows that competitiveness in the AI ​​era extends beyond model performance to data center and power infrastructure efficiency.

KAIST announced that a research team led by Professor Ming-Su Lu of the School of Electrical Engineering has conducted the first systematic analysis of how much computational resources and power an AI agent requires in a real-world service environment.

Applications that leverage large-scale language models (LLMs), such as ChatGPT, are rapidly evolving to do more than just answer questions. They are currently developing AI agents, next-generation AI systems that can solve complex tasks by planning, using external tools such as web searches, calculators, and code execution environments, and independently coordinating multiple steps.

AI agents are increasingly being adopted in areas such as software development, research, and workplace automation, but little is known about the amount of power and operating costs required to actually run them.

The researchers defined AI agents not just as software programs, but as a new type of workload that must be continuously processed by data center servers and graphics processing units (GPUs, high-performance chips used for large-scale AI calculations). The team then analyzed the computational load and energy consumption that occurs while running a real AI agent.

Our analysis shows that the AI ​​agent performs significantly more LLM calls than traditional chain-of-thought reasoning. Chain of Thought (CoT) refers to a method in which an AI model decomposes a reasoning process step by step to arrive at an answer. An LLM call, on the other hand, refers to each computational request made to a language model to generate a new decision or response.

Since the AI ​​agent repeatedly calls the language model during execution, response latency also increases significantly. The researchers found that response time can increase by up to 153.7 times while the external tool executes the task, even though the GPU remains idle for as much as 54.5% of the total execution time. In other words, as AI systems take on more complex tasks, new forms of inefficiency emerge: the underutilization of expensive GPUs.

The research team also analyzed the power consumption of AI agents at the data center scale. An AI agent using a 70 billion parameter LLM, which is comparable in scale to today’s commercial AI services, consumed an average of 348.41 Watt-hours per query. This is 136.5 times the energy consumed by traditional generative AI systems performing simple question answering.

Additionally, the team predicted a future scenario in which 13.7 billion AI agent requests are generated per day. This is comparable to today’s Google search traffic. In this scenario, data center power demand would reach approximately 198.9 gigawatts, far exceeding the scale of AI data centers currently under development (in the multi-gigawatt range) and equivalent to about half of the average power consumption in the United States.

This study shows that the focus of competition in the AI ​​era is shifting from “smarter AI” to “more efficient AI.” In the future, it will be important not only to improve the sophistication of AI models, but also to jointly optimize AI semiconductors, data centers, power infrastructure, etc. through collaborative design. Such an approach is expected to become a key strategy for reducing the operating costs of AI services and building a sustainable AI infrastructure.

“This study is the first to quantitatively show not only how AI is becoming intelligent, but also how much power and cost is required to implement and maintain that intelligence,” Professor Lu said. “As AI agents become more prevalent, it will become increasingly important to take an integrated co-design approach that optimizes not only the AI ​​data center infrastructure, but also the AI ​​agent model and power infrastructure,” he added. “Research and investment in this direction is essential to significantly reduce the cost of accessing AI services for end users while building a sustainable AI infrastructure.”

This study was conducted with Dr. Jiin Kim. student in the KAIST Department of Electrical Engineering as the first author. The paper was presented in February at the 32nd IEEE International Symposium on High-Performance Computer Architecture (HPCA), one of the most prestigious international conferences in computer system design. The research team also open-sourced the implementation and benchmarks of the AI ​​agent used in the paper to aid follow-up studies by researchers around the world.

Paper title: “The cost of dynamic inference: Unraveling AI agents and test time scaling from an AI infrastructure perspective.”

Open source repository: 10.1109/HPCA68181.2026.11408569

This research was supported by the Institute of Information and Communication Technology Planning and Evaluation (IITP) and the Samsung Electronics Future Technology Incubation Center through the SW Starlab program, the K-Cloud technology development program using AI semiconductors, and the Leading Technology Development Program for Promoting AI Semiconductor-based Data Centers.


Disclaimer: AAAS and EurekAlert! We are not responsible for the accuracy of news releases posted on EurekAlert! Use of Information by Contributing Institutions or via the EurekAlert System.



Source link