When Fei-Fei Li arrived in Princeton as an assistant professor in January 2007, she was assigned an office on the second floor of the Computer Science Building. Her neighbor was Christian Ferbaum.
On the surface, Li and Fellbaum didn't seem to have much in common. Li is a computer vision expert. Fellbaum is a linguist. But they liked each other. “We spoke frequently and became very friendly,” Felbaum said.
After all, their friendship is a coincidence for the future of computing. Their meeting sparked intellectual connections that affected the computer vision database. This is a database that leads to a paradigm shift in machine learning.
Both Li and Fellbaum were fascinated by the human mind's ability to store and retrieve large amounts of data. According to Li, they shared “a special interest in understanding how the mind conceptualizes the world, and even a mapping.”
“As humans, we are naturally adept at recognizing things after a glimpse,” Lee wrote in her book. The world I see. Even very young children can recognize surface properties, objects, scenes, faces, materials immediately and without touching them.
Humans are similarly skilled at learning and using words. This is an insight that animates Ferbaum's work. “We know tens of thousands of words and concepts,” Ferbaum said. “How do you organize them all? Our brains are very good, but we are not computers.”

When I became a neighbor in 2007, Fellbaum, a senior researcher in the Computer Science department, helped build WordNet, a huge database of over 145,000 English words over the past 20 years.
After completing his paper just two years ago, Li is about to embark on an equally ambitious project, creating a comprehensive image database for training computer vision systems.
After learning about Fellbaum's WordNet, Li decides to call her Project Imagenet.
Create a meaning map
Wardnet was launched by George A. Miller, Professor Princeton, who is one of the founders of cognitive psychology. His work focused on questions about how the mind processes information.
He was particularly interested in the mental ability to learn and recognize language. Grammar has a finite rule, says Fellbaum, but our vocabulary is dynamic and constantly evolving. “We continue to use words in new ways, add new words, throw away old words,” she said.
Miller hypothesized that each word is connected to the associated word, supporting this dynamic system through a network of semantic meaning. For example, the word “home” leads to words like “home”, “home”, “home”, and “structure”. New words are added to a branch-like network on the tree.

Miller's hypothesis on networks in this sense was ultimately supported by psychological experiments, Ferbaum said. Subjects recall “home” more quickly if they had just seen the word “home.”
Miller began building WordNet in 1985. He took part in the project in 1987 at Fellbaum Princeton's Linguistics. Twenty years later, when Li and Fellbaum became neighbors, WordNet was almost finished. The 20,000 noun categories essentially contained all the objects that English speakers could describe in words. Many wordnets have been created for other languages as well.
Li, in awe of WordNet, decided to use this magical piece as a scaffolding to create a comprehensive map of visual meaning.
She aimed to pair each object category in WordNet with around 1,000 annotated images, and ultimately build a computer vision dataset that reflects the bloating and complexity of the world.
By 2010, Li, her graduate students, and a small army of crowdsourced workers will compile millions of images and build “the largest hand-curated dataset in AI history,” she writes.
Soon, it will revolutionize deep learning.
How can datasets build artificial intelligence?
Current deep learning systems are built on three key components: neural network algorithms, powerful hardware, and a huge amount of organized training data.
In 2007, these components were not present as they are today. The computer had about 2% of the processing power of the current machine, neural network algorithms were not widely used, and organized data sets were small.
Only a handful of researchers focused on building larger datasets. Some computer scientists have questioned whether a large amount of training data will also help build artificial intelligence.
“Prestige comes from building models,” said Olga Russakovsky, a graduate student at Stanford University, who worked at Imagenet, and is now an associate professor of computer science at Princeton. “Most people wondered why collecting datasets is intellectually interesting.”
Li was the exception. Her hypothesis was originally developed as a graduate student at Caltech, and like human children, computer vision systems must see many examples in categories (e.g., 1000 images of German shepherds) before classifying objects at a glance.

Li knew that in order to build a comprehensive database that accurately compiles thousands of object categories, it would require the inclusion of millions of images. She also knew that millions of images would take years. As an assistant professor, she had limited resources. But “Princeton is really good at encouraging junior faculty to think big,” Russakovsky said.
In this spirit, Li found an alliance among senior faculty members of Kai Li (No Related), an expert in distributed systems and computer architecture. He was intrigued by the Imagenet idea and offered material help by donating a series of workstations. He also recommended graduate student Jia Deng as a potential collaborator for the project.
Deng, a Princeton professor of computer science, was equally intrigued. He agreed to join Li in her work and was to play an integral role in the structure of Imagenet. He was the lead author of his first paper on Imagenet, published in 2009.
At the time, Deng said many computer scientists were interested in studying data for the purposes of indexing and retrieving information on the Internet. The idea that data could be used to build artificial intelligence was “a very bold hypothesis,” he said.
Every deep learning everywhere
In 2025, deep learning is ubiquitous. When using image recognition to unlock a mobile phone, you are using a deep learning model. When Spotify suggests songs, when Amazon Alexa understands voice commands, Instagram encourages someone to follow, and ChatGpt answers the question, all of these tasks rely on deep learning.
But the power of deep learning gives us the possibility of doing more than simply making our lives more convenient, Russakovsky said. Deep learning can be used to improve farming methods by analyzing weather patterns and automating water usage. Optimizing the energy grid and traffic flow can reduce climate change. It helps healthcare professionals develop new treatments for illness. It helps improve patient surveillance and assess cognitive function. Physical capabilities can be expanded and enhanced to make the world more accessible to people with disabilities.

So, what happened to bringing deep learning, from the idea of fringes incubating in the university corridors to becoming globally prominent as the centre of technical and economic transformation?
To encourage colleagues to use ImageNet as a benchmark tool, in 2010, Li and her team created a competition in which expert teams could build models to recognize imageNet images and compare results. By then, the dataset was nearly complete, with over 14 million annotated images, by far the largest in history.
At first, the challenges were nothing exciting. But in 2012, the University of Toronto team shocked the AI world in ways no one would expect, using techniques (neural networks) that were largely abandoned by the wider community. Called Alexnet, this model was created by Geoffrey Hinton and his graduate students Alex Krizhevky and Ilya Sutskever.
By that point, Li and Deng had moved to Stanford University and recruited Russakovsy for the project. As organizers of the contest, Deng and Russakovsy had front row seats in what is called the “Big Bang Moments” of AI.
“This was the moment when people began to believe that deep learning models actually worked,” says Russakovsky.
Neural nets were nothing new. They have been in use since the 1980s, but were only used with limited constraints, such as reading handwritten numbers on checks. Data is a lacking component, transforming neural networks into the most successful and widely used machine learning approach. “Without data, you don't get results,” says Russakovsky.
Hinton won the Nobel Prize in 2024 for his contribution to artificial intelligence, a prize he shared with Princeton neuroscientist John Hopfield, who invented the class of networks that laid the foundations for Hinton's model.
Li, Hinton, Fellbaum, Miller and Hopfield all share deep intellectual connections. They are fascinated by the human mind and the ability to absorb and remember vast amounts of information. The interest in understanding human intelligence has created the most successful artificial intelligence paradigm of all time.
But the work isn't finished. According to Russakovsky, the human mind remains essential to advances in AI to create useful new forms of technology, consistent with the public interest.
In that sense, Imagenet and its ancestor Wordnet were just the beginning, all of its boldness and foresight. They provided the structure and, together with Alexnet, demonstrated that a large amount of highly organized data is needed.
However, many of the questions and deeper questions of artificial intelligence remain, and the human mind can continue to provide inspiration. “The many questions researchers are asking now,” said Raskovsky, “We ask that we reconsider things and ask: how does the human mind actually do this?”
