DigiRL: A New Autonomous Reinforcement Learning RL Method for Training Device Control Agents

Machine Learning


https://arxiv.org/abs/2406.11896

Advances in visual language models (VLMs) have enabled impressive capabilities of common sense, reasoning, and generalization. This means that it is now possible to develop fully independent digital AI assistants that can perform everyday computer tasks in natural language. However, improved reasoning and common sense capabilities do not automatically make an assistant behave intelligently. AI assistants are not just used to providing plausible responses based on pre-trained data, but also to complete tasks, behave rationally, and recover from mistakes. Therefore, we need a way to turn pre-trained capabilities into practical AI “agents.” Even the best VLMs, such as GPT-4V and Gemini 1.5 Pro, struggle to take appropriate actions when completing device tasks.

The paper discusses three existing approaches. The first approach is training multimodal digital agents, which face challenges such as direct device control at the pixel level in a coordinate-based action space, and the probabilistic and unpredictable nature of device ecosystems and the Internet. The second approach is environments for device control agents. These environments are designed for evaluation, offering a limited range of tasks in fully deterministic and stationary settings. The last approach is reinforcement learning (RL) for LLM/VLM. While RL work on foundational models focuses on single-turn tasks such as optimizing preferences, optimizing single-turn interactions from expert demonstrations can lead to suboptimal strategies for multi-step problems.

Researchers from UC Berkeley, UC Irvine, and Google DeepMind present DigiRL (RL for Digital Agents), a new autonomous RL approach for training device control agents. The resulting agents achieve state-of-the-art performance on several Android device control tasks. The training process has two phases: an initial offline RL phase where the agent is initialized using existing data, followed by an offline-to-online RL phase where the model obtained from offline RL is fine-tuned with online data. To train the online RL, a scalable and parallelizable Android learning environment is developed, including a robust general-purpose evaluation function based on VLM (average error rate against human judgment of 2.8%).

Researchers conducted experiments to evaluate DigiRL's performance on the challenging problem of controlling an Android device. It was important to understand whether DigiRL had the potential to generate an agent that could effectively learn through autonomous interaction while leveraging offline data for training. They performed a comparative analysis of DigiRL against:

  • A cutting edge agent built around a proprietary VLM using several prompt and search style techniques.
  • Implementing imitation learning with static human demonstrations with the same instruction distribution
  • A filtered behavioral cloning approach.

The agent trained using DigiRL was tested on a range of tasks from the Android in the Wild dataset (AitW) using a real Android device emulator. The agent achieved a 28.7% improvement over the existing state-of-the-art agent, 18B CogAgent (success rate increased from 38.5% to 67.2%), and performed over 9% better than the previous top autonomous learning method based on filtered behavior cloning. Additionally, despite having only 1.3 billion parameters, the agent outperformed advanced models such as GPT-4V and Gemini 1.5 Pro (success rate 17.7%). This makes it the first agent to achieve state-of-the-art performance in device control using an autonomous offline-to-online RL approach.

In summary, the researchers proposed DigiRL, a new autonomous RL approach for training device control agents, which sets new state-of-the-art performance on several Android control tasks in AitW. To achieve this, a scalable and parallelizable Android environment with a robust VLM-based generic evaluation function for rapid online data collection was developed. The DigiRL-trained agent achieved a 28.7% improvement over the existing state-of-the-art agent, 18B CogAgent. However, training was limited to tasks in the AitW dataset, rather than all possible device tasks. Future work therefore includes building on the algorithm research and expanding the task space, making DigiRL a base algorithm.


Please check paperAll credit for this research goes to the researchers of this project. Also, don't forget to follow us. twitter.

participate Telegram Channel and LinkedIn GroupsUp.

If you like our work, you will love our Newsletter..

Please join us 45,000+ ML subreddits

Sajjad Ansari is a final year undergraduate student at Indian Institute of Technology Kharagpur. As a technology enthusiast, he delves into practical applications of AI with a focus on understanding the impact of AI technology and its impact on the real world. He aims to express complex AI concepts in a clear and understandable manner.

[Announcing Gretel Navigator] Create, edit and augment tabular data with the first combined AI system trusted by EY, Databricks, Google and Microsoft.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *