Training AI like a baby improves performance

summary: Researchers have developed a human-inspired AI training method that improves object recognition by incorporating spatial data, increasing performance by 14.99%. This novel approach mimics the way infants learn from their environment to train AI systems more efficiently. The discovery could advance AI for extreme environments and space exploration.

Key Facts:

New AI training method uses spatial data to mimic infant learning.
AI models trained in this way performed up to 14.99% better than the base model.
The method was tested in a virtual environment to simulate real-world learning.

sauce: Pennsylvania State University

A novel, human-inspired approach to training artificial intelligence (AI) systems to identify and navigate around objects could lay the foundation for developing more advanced AI systems for exploring extreme environments and distant worlds, according to research from an interdisciplinary team at Pennsylvania State University.

During the first two years of life, children experience a limited number of objects and faces, but they see a variety of them from different perspectives and under different lighting conditions.

This shows a baby and a robot. — The researchers developed a new contrast learning algorithm, a type of self-supervised learning method that teaches an AI system to detect visual patterns and identify whether two images are derived from the same base image, resulting in positive pairs. Credit: Neuroscience News

Inspired by this developmental insight, the researchers introduced a new machine learning approach that uses information about spatial location to more efficiently train AI vision systems. They found that AI models trained with the new method outperformed baseline models by up to 14.99%.

They published their findings in the May issue of the journal. pattern.

“Current AI approaches use large sets of randomly shuffled photos from the internet for training. In contrast, our strategy is based on developmental psychology, which studies how children perceive the world,” said lead author Lizhen Zhu, a doctoral student in Penn State's School of Information Science and Technology.

The researchers developed a new contrastive learning algorithm, a type of self-supervised learning method in which an AI system learns to detect visual patterns and identify whether two images are derived from the same base image, resulting in a positive pair. However, these algorithms often treat images of the same object taken from different perspectives as separate entities rather than as a positive pair.

According to the researchers, by taking into account environmental data such as location, the AI system can overcome these challenges and detect positive pairs regardless of camera position or rotation, lighting angle and conditions, or changes in focal length or zoom.

“We hypothesize that infants' visual learning relies on position awareness. To generate an egocentric dataset with spatiotemporal information, we set up a virtual environment on the ThreeDWorld platform, a high-fidelity interactive 3D physically simulated environment. This allowed us to manipulate and measure the viewing camera's position as if the child was walking around the house,” Zhu added.

The scientists created three simulated environments, called House14K, House100K, and Apartment14K — “14K” and “100K” refer to the approximate number of example images taken in each environment. They then ran the basic contrast learning model and a model using the new algorithm three times in the simulation to see how well each could classify images.

The research team found that models trained with their algorithm performed better than the base model on a range of tasks: for example, on the task of recognizing rooms in a virtual apartment, the augmented model performed on average 99.35%, a 14.99% improvement over the base model.

These new datasets are available for training by other scientists via www.child-view.com.

“Training models in new environments with small amounts of data is always a challenge. Our work is one of the first attempts to use visual content to make AI training more energy-efficient and flexible,” said James Wang, distinguished professor of information science and technology and Zhu's advisor.

According to the scientists, the research has implications for the future development of advanced AI systems aimed at navigating and learning in new environments.

“This approach is particularly beneficial in situations where a team of autonomous robots with limited resources needs to learn how to navigate in a completely unfamiliar environment,” Wang said.

“To pave the way for future applications, we plan to refine the model to make better use of spatial information and incorporate more diverse environments.”

Collaborators in Penn State's Department of Psychology and Department of Computer Science and Engineering also contributed to the work.

Funding: This research was supported by the National Science Foundation and the Pennsylvania State University's Institute for Computational and Data Sciences.

About this AI research news

author: Francisco Tutera
sauce: Pennsylvania State University
contact: Francisco Tutera – Pennsylvania State University
image: Image courtesy of Neuroscience News

Original Research: Open access.
“Incorporating simulated spatial context information improves the effectiveness of contrastive learning models” by Lizhen Zhu et al. pattern

Abstract

Incorporating simulated spatial context information improves the effectiveness of contrastive learning models.

highlight

We developed an approach that uses spatial context as a similarity signal.
How to build image datasets using environmental sampling agents
Training with contextual information improves the state of the art in contrastive learning
Simulation data offers new forms of physically realistic augmentation

Overall picture

Despite being trained on massive datasets, current computer vision systems lag behind human children in learning about the visual world.

One possible reason for this discrepancy is the fact that humans, as embodied agents, actively explore their environment and sample data from a stable, contextualized visual world.

Contrastive learning is a machine learning technique that can learn general features without labeled data, similar to human childhood experiences.

It does this by grouping similar things or objects and separating dissimilar ones. Contrastive learning methods can be applied to multiple tasks, including training visual learning agents.

Improving these machine learning strategies is critical for the development of efficient intelligent agents, such as robots and vehicles, with the ability to explore and learn from their surroundings.

Source link