Reinforcement learning enables autonomous free flyer control on the International Space Station

Machine Learning


Researchers are increasingly exploring reinforcement learning as a way to control robots in the challenging environment of space, and a team led by Kenneth Stewart, Samantha Chapin, and Roxana Leonti of the U.S. Naval Research Laboratory has announced the first successful demonstration of the technology in orbit. They trained a deep neural network to autonomously control NASA's Astrobee robot aboard the International Space Station, replacing its standard control system and enabling complex navigation in microgravity. This work validates a new training pipeline that utilizes advanced simulation to accelerate learning and effectively bridge the gap between the simulated environment and the reality of spaceflight. Successful deployment of this ground-based training directly to space-based applications represents an important step toward enabling rapid adaptation and readiness for future missions focused on servicing, assembly, and manufacturing in space.

The overarching goal is to develop a robust autonomous control system for operations in space, including inspection, assembly, and maintenance. A key challenge to address is bridging the gap between simulated training environments and the complexities of real-world spaceflight. This study demonstrated the successful implementation of RL algorithms, especially proximity policy optimization, to control the Astrobee robot in both realistic simulations and a physical robot in a laboratory setting.

The key technology adopted is curriculum learning, where the robot starts with simple tasks and gradually progresses to more complex tasks to improve learning efficiency. Domain randomization, which varies the simulation parameters, played an important role in improving robustness. This research focuses on achieving precise six-degree-of-freedom control to enable complex maneuvers and maneuvers. Scientists also studied adaptive control techniques to deal with uncertainties and disturbances in spacecraft dynamics and developed algorithms for autonomous collision avoidance, which is essential for safe operation in cluttered environments.

The research also touches on the possibility of using RL to coordinate formations of multiple spacecraft and using visual servos to guide precise maneuvers and assembly tasks. The main methodology employed is reinforcement learning, which utilizes deep neural networks to approximate policy and value functions. Generalized advantage estimation was used to reduce the variance of the policy gradient method, and visual servoing utilized camera feedback to guide the robot's movements. Safe reinforcement learning techniques were also implemented to ensure that the robot operates within safe boundaries. Ongoing challenges include continuing to improve the technology to move algorithms from simulation to the real world, ensuring robustness to disturbances and sensor noise, and prioritizing safety to prevent collisions. Future work will focus on extending the algorithm to handle more complex tasks and integrating it with other spacecraft systems. They designed a deep neural network to replace Astrobee's standard control system, allowing it to autonomously navigate in microgravity. The core of this work involves training RL policies within NVIDIA's Omniverse Isaac Lab physics simulator. This simulator is a high-fidelity environment that can run thousands of randomized simulations in parallel to maximize the robot experience. To address the challenge of moving policy from simulation to the real world, the team implemented a curriculum learning approach.

This method gradually introduced variations within the simulated environment and effectively trained the RL policy to deal with unexpected changes and discrepancies between the simulation and the real ISS environment. Following simulation training, this policy was validated within NASA Ames' Astrobee simulator using Gazebo and ROS Noetic, demonstrating that it can seamlessly replace the robot's existing control system. Preliminary tests on the ground were then conducted at NASA Ames' Granite Laboratory, a facility that uses air bearings to mimic weightlessness. Here, RL policy commands controlled Astrobee's fan-based propulsion system, allowing direct performance comparisons with a baseline controller. Finally, the trained policy was successfully deployed and tested in a real microgravity environment on the ISS, marking the first demonstration of RL-based control over a free-flying space robot. In this study, we examine a novel training pipeline designed to bridge the gap between simulation and reality, enabling the transfer of learned control policies to complex spaces. The team used NVIDIA's Omniverse physics simulator and curriculum learning to train a deep neural network to replace Astrobee's standard control system and enable navigation in microgravity environments. This research included a multi-step validation process that began with training within the high-fidelity Omniverse Isaac Lab simulator, allowing us to create thousands of parallel and randomized environments to maximize the robot's experience.

The trained policies were then validated within NASA Ames' Astrobee simulator using Gazebo and ROS Noetic, demonstrating that robot behavior can be controlled within existing software. Preliminary tests on the ground at NASA Ames' Granite Laboratory, a facility that mimics weightlessness, confirmed the policy's performance compared to a baseline Astrobee controller. Finally, the RL policy successfully controlled Astrobee within the microgravity environment of the ISS, marking a first-of-its-kind demonstration of RL-based control for a free-flying space robot. Scientists developed a training pipeline that leverages high-fidelity physics simulators and curriculum learning to allow Astrobee to navigate in microgravity without relying on standard control systems. This achievement represents an important step toward making complex space operations such as servicing, assembly, and manufacturing more autonomous and responsive to changing mission needs. The research team demonstrated robustness to mass variations during ground tests, suggesting the approach may have broader applicability. Although the study focuses on validating the training pipeline, the authors suggest that the methodology could be used to generate more complex policies for a variety of tasks in space.

👉 More information
🗞 Crossing the Sim2Real gap between simulation and ground testing to space deployment of autonomous free flyer controls
🧠ArXiv: https://arxiv.org/abs/2512.03736



Source link