Children who are just learning to walk can go a little too fast and fall or bump into furniture. But that causal factor gives them valuable information about how their bodies move through space to avoid future falls.
Machines learn in similar ways to humans, including learning from their mistakes. But for many machines, such as self-driving cars and power systems, learning on the job while human safety is at risk poses a problem. As machine learning matures and becomes more prevalent, there is growing interest in applying it to highly complex and safety-critical autonomous systems. However, the promise of these technologies is hampered by the safety risks inherent in the training process and beyond.
A new research paper challenges the idea that learning safe behavior in an unfamiliar environment requires unlimited trials.Papers recently published in journals IEEE Transactions on Automatic Controlpresents a new approach that can reliably learn safe behaviors with complete confidence while managing a balance between optimization, encountering dangerous situations, and rapid recognition of dangerous behaviors.
“In general, machine learning looks for the most optimized solution, but it can introduce more errors in the process. If an error can hit a wall, that’s a problem,” says Johns.・Juan Andrés Baserque, assistant professor of electrical and computer engineering at the Swanson School of Engineering, who led the research with Associate Professor Enrique Marada of Hopkins University, explained. “This study showed that learning a secure policy is fundamentally different from learning an optimal policy. This means that policies can be effectively executed independently. ”
The research team conducted research in two different scenarios to illustrate the concept. By making reasonable assumptions about exploration, we created an algorithm that detects all risky behaviors within a limited number of rounds. The team also faced the challenge of finding the best policy for the Markov Decision Process (MDP) with almost certain constraints. Their analysis highlighted the trade-off between the time required to detect a dangerous action in the underlying MDP and the level of exposure to the dangerous event. MDP is useful because it provides a mathematical framework for modeling decisions in situations where the outcome is partly random and partly under the control of the decision maker.
To validate their theoretical findings, the researchers performed simulations confirming the identified trade-offs. These findings also suggested that incorporating safety constraints could facilitate the learning process.
“This study challenges the popular belief that an unlimited number of trials is required to learn safe behaviors,” said Bazerk. “Our results show that by effectively managing the tradeoffs between optimality, exposure to hazardous events, and detection time, we can achieve guaranteed safety without endless searches. This has profound implications for robotics, autonomous systems, artificial intelligence, and more.”
A paper was published in a journal IEEE Transactions on Automatic Control (DOI: 10.1109/TAC.2023.3240925), coauthors include Agustin Castellano, Hancheng Min, Juan Andres Bazerque, and Enrique Mallada.
journal
IEEE Transactions on Automatic Control
article title
Learn how to limit your exposure and almost certainly behave safely
Disclaimer: AAAS and EurekAlert! We are not responsible for the accuracy of news releases posted on EurekAlert!. Use of information by contributing institutions or via the EurekAlert system.
