Fei-fei li on spatial intelligence as the next frontier of AI

Last month, in front of a enthusiastic audience of Y Combinators, Fei-Fei Li, often referred to as the godmother of AI, spoke to Diana Hu about spatial intelligence and why she thinks it is the next important step in AI and is essential to achieving artificial general information.

https://www.youtube.com/watch?v=_pion-cpop0

In the early part of the video, Li discusses the creation of Imagenet in 2009. This is a pioneering project that provided important data for machine learning algorithms and earned a share in this year's Elizabeth Queen Elizabeth Award. She explains that the lack of data was a key issue in computer vision at the time, and that imagenet aimed at shifting paradigms to data-driven ways. Imagenet's open sourcing and the establishment of Imagenet Challenge contributed to promoting community collaboration, leading to breakthroughs like Alexnet in 2012, combining GPU computing with deep learning.

Imagenet initially focused on object recognition, but later realized its long-standing dream of allowing machines to “tell the story of the scene” by not only identifying individual objects, but also understanding the entire visual context. This dream came true around 2015 along with student Andrej Karpathy, as he published some of his first papers on image captions.

The main theme of the lecture is spatial intelligence, which Li considers to be important in taking into account the next frontier of AI and achieving the goals of artificial general information.

It should be noted that in her approach to solving the problems of spatial intelligence, Li draws inspiration from evolution, and while human language development took less than half a million years, her ability to understand and interact with the 3D world took 540 million years. In his speech, he states that the vision that enables understanding and navigation of the 3D world is a catalyst for the evolutionary arms race that has led to increased animal intelligence.

Overview of the challenges of spatial intelligence, Li talks about its complexity, explaining that the real world is fundamentally 3D, and when time is added, it becomes 4D, which becomes a much more combined and difficult problem than language. She further points out that to sense the visual world, it is mathematically inappropriate to disrupt 3D information into 2D (like the human eye or camera), and requires multi-sensors to resolve, and spatial intelligence data is not as easily accessible as language.

In 2024, to truly capture the world's 3D structure and spatial intelligence, Li founded the World Lab to create a “world model” that allows for understanding, generation, inference and interaction in the 3D world.

In the final part of the lecture, before the Q&A session, Li argues that the utility of the spatial intelligence model is vast, ranging from the creation of (for creators, architects and game developers) to robotics and robotics learning. She also expresses excitement about the possibility of metaverse, particularly with the convergence of hardware and software.

More information

World Lab

Demis Hassabis & Fei-fei li beats with ai

AlexNet source code now open source

To let you know about new articles about I Programmers, sign up for our weekly newsletter, subscribe to our RSS feed and follow us Twitter, Facebook or LinkedIn.