Google DeepMind at ICML 2024

Machine Learning


the study

Published

Exploring AGI, scaling challenges, and the future of multimodal generative AI

Next week, the artificial intelligence (AI) community will come together for the 2024 International Conference on Machine Learning (ICML). Taking place from July 21-27 in Vienna, Austria, the conference is an international platform to showcase the latest advancements, exchange ideas, and shape the future of AI research.

This year, Google DeepMind teams will be presenting over 80 research papers, and our booth will also feature Gemini Nano, a multimodal on-device model, LearnLM, a new series of educational AI models, and demo TacticAI, an AI assistant that can help with soccer tactics.

Here we introduce some of the oral, spotlight, and poster presentations.

Defining the Path to AGI

What is artificial general intelligence (AGI)? The phrase describes an AI system that is at least as capable as a human at most tasks. As AI models continue to evolve, it becomes increasingly important to define what AGI would look like in practice.

We present a framework for classifying the capabilities and behaviors of AGI models. Depending on their performance, versatility, and autonomy, our paper classifies systems ranging from non-AI computers to emerging AI models and other new technologies.

We also show that open-endedness is key to building general-purpose AI that surpasses human capabilities: while many recent AI advances have been driven by existing internet-scale data, open-ended systems can generate new discoveries that extend human knowledge.

At ICML, we will be demonstrating Genie, a model that can generate a variety of playable environments based on text prompts, images, photos, and sketches.

Scaling AI systems efficiently and responsibly

Developing larger, more performant AI models requires more efficient training methods, closer alignment with human preferences, and better privacy protections.

We show that using classification instead of regression techniques makes it easier to scale deep reinforcement learning systems to achieve state-of-the-art performance across a variety of domains. Furthermore, we propose a new approach to predict distributions of outcomes of actions of reinforcement learning agents, helping to rapidly evaluate new scenarios.

Our researchers present an alignment maintenance approach that reduces the need for human supervision, and a novel approach to fine-tuning large-scale language models (LLMs) based on game theory can better align the LLM output with human preferences.

We criticize the approach of training models on public data and then only fine-tuning them with “differentially private” training, arguing that this approach may not provide the privacy or practicality that is often claimed.

VideoPoet is a large-scale language model for zero-shot video generation.

New approaches in generative AI and multimodality

Generative AI technologies and multimodal capabilities are expanding the creative possibilities of digital media.

We present VideoPoet, which uses LLM to generate state-of-the-art video and audio from multimodal inputs including images, text, audio, and other video.

We also share Genie (Generative Interactive Environment), which can generate a variety of playable environments to train AI agents based on text prompts, images, photos, and sketches.

Finally, we present MagicLens, a novel image retrieval system that uses text instructions to retrieve images with richer relationships beyond visual similarity.

Supporting the AI ​​Community

We are proud to sponsor ICML and support efforts led by Disability in AI, Queer in AI, LatinX in AI, and Women in Machine Learning to foster a diverse community in AI and machine learning.

If you're attending the conference, please stop by the Google DeepMind and Google Research booth to meet our team, see live demos, and learn more about our research.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *