Google DeepMind Vice President on the Future of AI Intelligence

Raia Hadsell, VP of Research at Google DeepMind, recently took to the stage at AI Engineer Europe to discuss the evolving frontier of artificial intelligence and its impact on the future of intelligence itself. Hadsell, who has dedicated more than 13 years to bridging academia and industry in AI and is also a UK AI Ambassador, gave a compelling glimpse into DeepMind’s ambitious research agenda.

From philosophy to AI: a research journey

Hadsell’s own journey into AI began with a philosophical background that instilled a deep appreciation for the fundamental questions surrounding intelligence and consciousness. This academic foundation unexpectedly led her into the world of practical computing in AI, where she spent her early career working on convolutional neural networks in robotics and exploring the complexity of neural networks, she explained.

Her career trajectory then shifted to more complex AI challenges, including working with Yann LeCun on neural networks and then moving to Google DeepMind. There, she led a team of more than 1,200 scientists and engineers across 10 labs, focusing on fundamental AI research across an incredibly wide range of disciplines. This includes “Agentic Worlds,” which are aimed at advanced world models and generally embodied agents. “AI for humans” focuses on social science, medicine, and education. and “Sustainability” specializes in climate, energy, and earth modeling. Additionally, DeepMind is exploring “creative technologies” that push the boundaries of AI-powered creativity and “advanced models” for basic learning and multimodal research.

Gemini: A unified vision for AI

The bulk of Hadsell’s talk focused on Google DeepMind’s Gemini model. She highlighted Gemini Embeddings 2. It’s an omnimodal Gemini-derived expression feature designed for search that she describes as “almost magical.” Launched in preview with Vertex AI and Gemini API, this model aims to unify semantic space by seamlessly mapping text, images, video, audio, and PDF into a single embedded space.

Hadsell highlighted the “intrinsic benefits” of Gemini Embeddings 2, saying: “Native Advantage: Eliminates “lossy” intermediate steps such as OCR and transcription. ” This approach not only simplifies complex pipelines but also enables a variety of high-value multimodal applications. He also pointed out that Gemini Embeddings 2 is built on the Gemini architecture, which inherits industry-leading multimodal and contextual understanding, is the top benchmark across modalities, and captures complex relationships across over 100 languages.

Advances in AI through simulation and beyond

The conversation then turned to the important role of games and simulations in AI research for artificial general intelligence (AGI). Hadsell highlighted DeepMind’s pioneering work in this space, starting with early success with Atari games in 2013 and mastering Go, Chess, and Shogi with AlphaGo in 2016. Subsequent advances include reaching grandmaster level with StarCraft II using multi-agent reinforcement learning in 2019, and more recently with DeepMind Control Suite and Catch & Carry for robotics.

Hadsell then detailed GenCast (2024), a new AI model designed for probabilistic weather forecasting. She explained that the chaotic nature of weather requires probabilistic forecasts that provide users with significant uncertainty information and probabilities of extreme events. Unlike traditional physics-based solvers that are slow and computationally intensive, GenCast provides a more efficient and accurate approach, outperforming gold standard predictions in 97% of evaluations. GenCast accomplishes this by generating probabilistic predictions through sampling. This method has been demonstrated to be significantly faster and more accurate than existing models.

The discussion also touched on the team’s efforts to generate diverse and interactive 3D environments using models such as Genie 2 and Genie 3. Genie 2 is the first model to create and simulate diverse 3D environments, allowing for realistic, non-real-time control. A more advanced iteration, Genie 3, provides long-term memory for consistently generated and prompted world events over minutes, allowing users to interact with and shape these virtual worlds. Hadsel illustrated this with examples of generating playable worlds from text prompts, futuristic cities, clay-like environments, and even detailed 3D worlds with guideable creatures.

The presentation concluded with a forward-looking perspective, highlighting the potential of these advances to revolutionize fields ranging from entertainment and education to scientific research and environmental modeling. Google DeepMind’s continued efforts in these areas represent a commitment to pushing the boundaries of what AI can achieve and shaping a future where both artificial and human intelligence can thrive.

Source link