• Unlike traditional systems, inverse reinforcement learning (IRL) allows AI to infer our values and priorities by directly analyzing our behavior.
• From self-driving cars to social robotics, this method helps machines learn cultural nuances and complex gestures.
• Although this approach promises more human-like interactions, it is still costly in terms of computing power and requires highly accurate data to predict true intent.
Due to user cultural characteristics, an AI system that works in France may not be equally successful in other countries. Based on this observation, researchers at the University of Washington hypothesized that AI could learn cultural values by observing human behavior. In other words, AI can absorb values in the same way that children do in their learning process. To conduct the experiment, the researchers used an agent equipped with inverse reinforcement learning (IRL), a learning method that is very different from traditional learning systems. “Our results provide proof of concept that AI agents have the ability to directly learn culturally typical behaviors and values by observing human behavior,” they explain. This is because the method is based on behavioral observation and a continuous reward system.
IRL is useful in robotics for teaching robots to imitate the movements of experts, but it can also be useful for understanding and modeling behavior, for example to help psychologists and economists understand the motivations of individuals.
How IRL lets AI create its own rules
Traditional AI systems use reinforcement learning. In order for the AI to learn the task correctly, it is given a reward function, such as a point system. “AI is trying to maximize future rewards,” explains Lina Rojas-Barahona, a researcher at Orange who specializes in conversational systems. “With IRL, we don’t know the reward function. In some cases, we can’t define the reward function. For example, if we want to teach a self-driving car to drive, we know that there is no single way to drive.” In this case, the idea behind IRL is to collect information about people who know how to drive and learn by observing them. “The AI then creates its own reward function based on the observations and uses it to train itself.”
IRL is useful in robotics for teaching robots to imitate the movements of experts, in addition to subtle learning tasks such as driving, but it can also be useful in understanding and modeling behavior, for example to help psychologists and economists understand the motivations of individuals. In healthcare, the idea is to derive preferred treatment protocols by observing the decisions of the best medical professionals. In finance, IRL can be used to model the strategies of professional investors, detect market anomalies, and automate portfolio management.
Costly method for precision
However, this approach has limitations and is costly. Human behavior must be recorded for hours to ensure that the data is representative and qualitative, and processing this data requires significant computing power. Like RL, IRL’s AI operates in an iterative loop, but unlike RL, it must learn the reward, compare its results to human results, adjust its behavior, redefine the reward again, and so on. “We need to predict this reward, but learning and inference are more expensive, which can create problems in terms of computing power and latency,” explains Lina Rojas-Barahona.
Beyond material costs, IRL also faces methodological limitations. It is the inability of certain systems to predict long-term intentions. In an article published in the journal Nature, a team of Chinese researchers announced that they have developed a way to allow social robots to move smoothly and safely among humans. Current research on robot navigation uses IRL to mimic human trajectories, but it is based on human goals, i.e., human directions rather than destinations. At long distances, the robot’s decisions become less accurate. Researchers therefore developed an approach that explicitly incorporates navigation goals into the robot’s learning process. The robots first train in random virtual environments and then refine their movements by observing humans and understanding social norms. This method allows the robot to make decisions step by step without having to predict complex future trajectories.

