Improving the speed and energy efficiency of AI agents | Massachusetts Institute of Technology News

Agenttic Workflow is an artificial intelligence-powered software system that connects multiple models and external tools to tackle complex tasks such as analyzing videos and answering questions about them.

However, the way these highly fragmented systems are designed and deployed often creates inefficiencies that can lead to wasted computation, energy, and cost.

To improve efficiency, researchers at MIT and Microsoft have developed an intelligent system that streamlines the process of designing agent workflows and automatically optimizes how those workflows are implemented.

This new approach allows developers to describe in easy-to-understand language what they want an agent workflow to do without having to specify all the application details upfront.

The system automatically determines the best models and tools to use, as well as the ideal hardware configuration and computational resource allocation when a workflow is executed by a cloud provider.

Adjust these configurations on the fly based on each user’s priorities, such as minimizing cost or maximizing speed.

When tested on several agent workloads, this new system reduced the number of compute units required for deployment, significantly reducing energy requirements and costs compared to traditional approaches without hindering performance.

“Agent workflows are becoming very complex and are rapidly becoming the backbone of cloud providers’ activities. Energy usage is a big concern, so we need to pay close attention to how efficient these workflows are. It’s very easy to over-allocate resources and waste energy and money. Enabling cloud providers to intelligently redirect these workflows to better optimize resources is a win for everyone involved,” said Professor of Electrical Engineering and Computer Science (EECS) said Gohar Chaudhry, graduate student and lead author. Part of the paper on this system.

Adam Belay, associate professor at EECS and member of the MIT Computer Science and Artificial Intelligence Laboratory, also contributed to the paper. Senior author Ricardo Bianchini, Technical Fellow and Corporate Vice President, Microsoft Azure. Microsoft Azure etc. This paper will be presented at the USENIX Symposium on Operating System Design and Implementation.

Configuration challenges

An agent workflow is a system of multiple autonomous AI agents that work together to use different models and tools, such as databases and Python programs, to dynamically complete multi-step tasks such as data processing and code generation.

These workflows serve as behind-the-scenes processes that power your user-facing applications.

Typically, developers have to hardcode all technical choices upfront. You need to define which AI agents, models, and tools to use and in what order to use them. You also need to specify the hardware on which the workflow will run and how to balance tradeoffs such as speed and cost.

This is especially challenging because agent workflows integrate multiple black box models and diverse tools, each with unique configuration options and potentially provided by different companies.

When a new AI model is released that improves the accuracy or efficiency of an application, developers must start from scratch to implement it.

“Even if you wanted to do all of this manually, the scope of possible configurations is so large that it’s highly unlikely that you’ll be able to optimally configure your workflow,” Chaudhry says.

Additionally, cloud data centers that deploy applications for their customers don’t have visibility inside the workflow that allocates hardware resources in the most efficient manner at the time of a user’s request.

Using this new system, called Murakkab (Urdu for composition of things), the researchers aimed to optimize the entire agent workflow process.

dynamic decision making

First, Murakkab allows developers to describe the intent of their application in high-level terms, rather than detailing how to create agent workflows. Many components of that workflow need to come together.

For example, a developer might want to describe a video Q&A application that extracts key frames, generates transcripts, and answers user questions about the video.

“There are many ways to do this, and these different models and tools all impact how quickly an application completes its tasks,” he says.

Murakkab takes a developer’s simple specifications and automatically identifies the best existing models and tools to incorporate into the workflow.

It also determines which components must run in sequence and which components can run in parallel to improve performance.

“The platform dynamically determines its configuration over time, so if a new model or GPU accelerator comes out tomorrow, developers don’t have to worry about it,” he says.

When cloud providers deploy their applications to their customers, Murakkab optimizes the workflow by configuring components to meet user constraints, such as prioritizing accuracy while meeting latency requirements.

Adaptively identify ideal hardware allocation and deployment schedules to maximize efficiency in real-time, and generate workflows that cloud providers can execute.

“Our system also provides visibility of multiple workloads to cloud providers, allowing them to share compute resources in the most efficient way while meeting user constraints,” he says.

When tested with different agent workflows for video Q&A and code generation, Murakkab met user requirements while using only about 35 percent of the computations required by other methods. The cost was less than 25% and the energy consumption was only about 27%.

Murakkab’s dynamic nature also allows users to balance tradeoffs. In one example, the system reduced energy consumption for agent workflows by more than an order of magnitude, while reducing customer accuracy by only about 2%.

The system was also able to identify an unexpected ideal configuration for the model to select video frames, optimizing the performance of the video Q&A task. Chaudhry says it’s nearly impossible for developers to perform this type of optimization manually.

Next, the researchers plan to scale the system to more complex workflows and larger computing clusters, while exploring opportunities to optimize new agent applications.

“There is a lot of potential to make these workflows more resource-optimized and significantly reduce energy consumption, but we need to think about this at the scale of the major cloud platforms,” Chaudhry says.

This research was supported in part by the Semiconductor Research Corporation and the Defense Advanced Research Projects Agency.

Source link