Reinforcement learning enables real-time robot control that spikes neurons in a small number of trials

Researchers are tackling the challenge of controlling high-speed robots using biologically plausible neural networks. Irene Ambrosini, Ingo Blakowski and Dmitrii Zendrikov from UZH and the Institute of Neuroinformatics at ETH Zurich, along with Cristiano Capone and colleagues, are demonstrating a new approach to training a network of slow silicon neurons to play air hockey. Their work is important because it leverages co-designed hardware and learning algorithms to achieve real-time learning and robot control success with an astonishingly low number of training trials. This research bridges the gap between neuroscience-inspired computing and practical robotic systems, suggesting that brain-inspired techniques can effectively manage fast-paced interactions and enable continuous learning in intelligent machines.

This research bridges the gap between neuroscience-inspired computing and practical robotic systems, suggesting that brain-inspired techniques can effectively manage fast-paced interactions and enable continuous learning in intelligent machines.

Spiking Network learns air hockey in real time.

This breakthrough establishes real-time learning within a setup consisting of a computer and a neuromorphic chip-in-the-loop, enabling practical training of spiking neural networks for robotic autonomous systems. This study uncovered a bridge between neuroscience-inspired hardware and real-world robot control, proving that brain-inspired approaches can effectively address fast-paced interaction tasks. Additionally, this research supports constant learning for intelligent machines, which could revolutionize the way robots adapt and operate in dynamic environments. The system operates in a 6D continuous state space that includes puck position, velocity, and striker coordinates over a 1.038x 1.948m workspace and represents a significant improvement over simplified benchmarks.

This work addresses key scalability challenges by moving beyond toy reinforcement learning problems to a physical robot platform with adaptive precision continuous state encoding. Researchers demonstrated neuromorphic reinforcement learning for continuous motion primitives that execute ballistic trajectories at 50Hz, requiring predictive decisions rather than frame-level responses. By randomizing puck position and velocity over a range of 1.0-1.5 m/s, the system achieved 96-98% success over 2000 episodes, demonstrating robust learning and adaptation capabilities. Experiments demonstrate the system’s ability to handle the high dimensionality, physical constraints, and temporal dynamics inherent in real-world robot control. This platform utilizes an anthropomorphic arm over a standard air hockey table, introducing a larger working space and increasing kinematic complexity. This work complements existing neuromorphic robotics efforts focused on event-based vision and spiking convolutional neural networks, paving the way for efficient event-driven perception combined with adaptive decision-making in autonomous systems.

Air hockey spike network and reinforcement learning

This setup consists of a chip-in-the-loop that enables hands-on training of spiking neural networks for autonomous robotic systems, bridging neuroscience-inspired hardware and real-world robot control. This work pioneered biologically plausible “waking” and “dreaming” reinforcement learning phases, first demonstrated on Atari Pong and later extended to real-time hardware on the DYNAP-SE chip. At the heart of this advancement are spiking neural networks. Spiking neural networks model neurons as leaky integrated firing units that communicate via discrete spikes, a key mechanism for energy efficiency and temporal coding. The researchers implemented deep reinforcement learning algorithms such as DQN and TD3 within these spiking neural networks, but moved beyond reliance on non-local learning rules by employing recent advances in local plasticity.

This enables online learning with recurrent SNNs suitable for neuromorphic hardware and is an important step towards scaling to more complex tasks. The experiment uses a MuJoCo implementation of an air hockey environment that features a flat table and an anthropomorphic manipulator that controls a mallet-shaped end effector. The agent must observe the 2D position and velocity of the puck in parallel with the position of the end effector and intercept the puck as it slides across a 1.038 × 1.948 m workspace. The control loop operates at 50Hz, and the agent selects one of two discrete actions corresponding to a predefined motion primitive that is executed as an open-loop trajectory using a spline velocity profile.

This methodology enabled rigorous testing of the functionality of neuromorphic hardware in a dynamic closed-loop robot control application, achieving success rates of 96-98% over 2000 episodes with randomized puck positions and velocities ranging from 1.0 to 1.5 m/s. The team addressed scalability issues by extending the neuromorphic RL framework first demonstrated in Atari Pong to the challenging realm of physical robot manipulation and moving beyond toy problems to real-world air hockey using adaptive precision continuous state encoding. They demonstrated neuromorphic RL for continuous motion primitives executing ballistic trajectories at 50Hz, requiring predictive decisions rather than frame-level responses, and achieving generalization under uncertainty through randomization of the pack parameters.

Robotic air hockey learns via spike neurons

In this study, we successfully scaled up a neuromorphic reinforcement learning framework from 2D pixel games to physical 3D robot tasks, increasing the dimensionality of network inputs from 4 to 6 inputs, and adapting control from single-step actions to synthetic motion primitives. The experimental results showed that when the fixed puck was placed 1.0 m from the robot, a 100% success rate was obtained within 200 trials, establishing a baseline for performance evaluation. The team measured task-level generalization under various initial pack conditions and achieved 100% success after 1000 episodes of constant velocity lateral firing from the edge of the table. Velocity variations were introduced in the range 1.0–1.5 m/s, which extended the learning time, but the success rate stabilized at >96% after 1500 episodes.

Randomizing both initial position (within a 0.10 m window) and velocity yielded the best asymptotic performance, exceeding 98% after 1,300 episodes. This suggests that a broader state distribution prevented overfitting. These results demonstrate the feasibility of event-driven e-prop and reservoir architectures for low-power predictive control in real-world high-speed robotics. Encoding range scalability tests conducted using 1020 silicon neurons showed a narrow speed range. [0.7, 0.9] m/s achieved more than 97% success within about 150 episodes. medium range [0.7, 1.2] m/s required approximately 700 episodes to achieve similar performance, while the most extensive episode was [0.7, 1.5] m/s, the asymptotic success rate decreased by only 4% from 97% to 93%.

Our data show that, consistent with the finite resolution of fixed-size networks, larger input ranges increase convergence time and slightly degrade performance. Before training, the agent exhibited highly variable and nearly random action selection, but learning transformed this stochastic search into a deterministic and temporally precise strategy. This integration minimized timing differences, enabled robust generalization across different initial conditions, and demonstrated the network’s ability to extract consistent and reliable policies from observations of noisy conditions. This work achieved stable performance within 1500, 2000 episodes using 1020 DYNAP-SE neurons and outperformed simulation-based bioinspired RL using 10,000 neurons.

Demo of neuromorphic air hockey control using 1020 neurons

Scientists have demonstrated successful robot manipulation using a small spiking neural network consisting of just 1020 neurons. This study bridges neuroscience-inspired hardware and real-world robot control, demonstrating the potential of brain-inspired approaches to fast-paced interaction tasks. Notably, this network achieves better performance than simulations with 10 times as many neurons, highlighting the efficiency of neuromorphic hardware. However, the authors acknowledge that there are limitations associated with the fixed size of the silicon network, which currently limits the range of pack speeds that can be reliably handled, with success rates decreasing from 97% to 86% as the speed range increases. Future work could address this issue by utilizing all available processor cores, implementing offline learning mechanisms, integrating event camera inputs to improve latency and robustness, and potentially validating the system on platforms such as iCub under real-world conditions.

👉 More information
🗞 Use spike reinforcement learning to train slow silicon neurons to control ultra-fast robots
🧠ArXiv: https://arxiv.org/abs/2601.21548

Source link

打开Binance账户 commented on Venture capital is opening the gates for defense tech: Can you be more specific about the content of your
注册 commented on Apple Stops Human Support on X: Your point of view caught my eye and was very inte
god of كازينو commented on Apple and Salesforce respond to YouTube video complaints: Hello Dear, are you actually visiting this web pag
创建免费账户 commented on CX Decoded Podcast Episode 2: AI Empowered CX: Real Conversations, Real Results: Shri Nandan, Comcast: Thank you for your sharing. I am worried that I la
开设Binance账户 commented on Driving Innovation & Making a Lasting Impact: Can you be more specific about the content of your

Reinforcement learning enables real-time robot control that spikes neurons in a small number of trials

Spiking Network learns air hockey in real time.

Air hockey spike network and reinforcement learning

Robotic air hockey learns via spike neurons

Demo of neuromorphic air hockey control using 1020 neurons

RECENT POSTS

Beyond layoffs, will companies end up regretting AI-related job cuts for the rest of their lives? (Spoiler

AI Video Face Swap Tool: How DeepFakeMaker creates stunning deepfake videos

AI-equipped object detection device using next-generation space-class MPU “AIRIS” successfully detects ships in orbit

Spiking Network learns air hockey in real time.

Air hockey spike network and reinforcement learning

Robotic air hockey learns via spike neurons

Demo of neuromorphic air hockey control using 1020 neurons

Related Posts