Researchers are tackling the challenge of controlling high-speed robots using biologically plausible neural networks. Irene Ambrosini, Ingo Blakowski and Dmitrii Zendrikov from UZH and the Institute of Neuroinformatics at ETH Zurich, along with Cristiano Capone and colleagues, are demonstrating a new approach to training a network of slow silicon neurons to play air hockey. Their work is important because it leverages co-designed hardware and learning algorithms to achieve real-time learning and robot control success with an astonishingly low number of training trials. This research bridges the gap between neuroscience-inspired computing and practical robotic systems, suggesting that brain-inspired techniques can effectively manage fast-paced interactions and enable continuous learning in intelligent machines.
This research bridges the gap between neuroscience-inspired computing and practical robotic systems, suggesting that brain-inspired techniques can effectively manage fast-paced interactions and enable continuous learning in intelligent machines.
Spiking Network learns air hockey in real time.
This breakthrough establishes real-time learning within a setup consisting of a computer and a neuromorphic chip-in-the-loop, enabling practical training of spiking neural networks for robotic autonomous systems. This study uncovered a bridge between neuroscience-inspired hardware and real-world robot control, proving that brain-inspired approaches can effectively address fast-paced interaction tasks. Additionally, this research supports constant learning for intelligent machines, which could revolutionize the way robots adapt and operate in dynamic environments. The system operates in a 6D continuous state space that includes puck position, velocity, and striker coordinates over a 1.038x 1.948m workspace and represents a significant improvement over simplified benchmarks.
This work addresses key scalability challenges by moving beyond toy reinforcement learning problems to a physical robot platform with adaptive precision continuous state encoding. Researchers demonstrated neuromorphic reinforcement learning for continuous motion primitives that execute ballistic trajectories at 50Hz, requiring predictive decisions rather than frame-level responses. By randomizing puck position and velocity over a range of 1.0-1.5 m/s, the system achieved 96-98% success over 2000 episodes, demonstrating robust learning and adaptation capabilities. Experiments demonstrate the system’s ability to handle the high dimensionality, physical constraints, and temporal dynamics inherent in real-world robot control. This platform utilizes an anthropomorphic arm over a standard air hockey table, introducing a larger working space and increasing kinematic complexity. This work complements existing neuromorphic robotics efforts focused on event-based vision and spiking convolutional neural networks, paving the way for efficient event-driven perception combined with adaptive decision-making in autonomous systems.
Air hockey spike network and reinforcement learning
This setup consists of a chip-in-the-loop that enables hands-on training of spiking neural networks for autonomous robotic systems, bridging neuroscience-inspired hardware and real-world robot control. This work pioneered biologically plausible “waking” and “dreaming” reinforcement learning phases, first demonstrated on Atari Pong and later extended to real-time hardware on the DYNAP-SE chip. At the heart of this advancement are spiking neural networks. Spiking neural networks model neurons as leaky integrated firing units that communicate via discrete spikes, a key mechanism for energy efficiency and temporal coding. The researchers implemented deep reinforcement learning algorithms such as DQN and TD3 within these spiking neural networks, but moved beyond reliance on non-local learning rules by employing recent advances in local plasticity.
This enables online learning with recurrent SNNs suitable for neuromorphic hardware and is an important step towards scaling to more complex tasks. The experiment uses a MuJoCo implementation of an air hockey environment that features a flat table and an anthropomorphic manipulator that controls a mallet-shaped end effector. The agent must observe the 2D position and velocity of the puck in parallel with the position of the end effector and intercept the puck as it slides across a 1.038 × 1.948 m workspace. The control loop operates at 50Hz, and the agent selects one of two discrete actions corresponding to a predefined motion primitive that is executed as an open-loop trajectory using a spline velocity profile.
This methodology enabled rigorous testing of the functionality of neuromorphic hardware in a dynamic closed-loop robot control application, achieving success rates of 96-98% over 2000 episodes with randomized puck positions and velocities ranging from 1.0 to 1.5 m/s. The team addressed scalability issues by extending the neuromorphic RL framework first demonstrated in Atari Pong to the challenging realm of physical robot manipulation and moving beyond toy problems to real-world air hockey using adaptive precision continuous state encoding. They demonstrated neuromorphic RL for continuous motion primitives executing ballistic trajectories at 50Hz, requiring predictive decisions rather than frame-level responses, and achieving generalization under uncertainty through randomization of the pack parameters.
Robotic air hockey learns via spike neurons
In this study, we successfully scaled up a neuromorphic reinforcement learning framework from 2D pixel games to physical 3D robot tasks, increasing the dimensionality of network inputs from 4 to 6 inputs, and adapting control from single-step actions to synthetic motion primitives. The experimental results showed that when the fixed puck was placed 1.0 m from the robot, a 100% success rate was obtained within 200 trials, establishing a baseline for performance evaluation. The team measured task-level generalization under various initial pack conditions and achieved 100% success after 1000 episodes of constant velocity lateral firing from the edge of the table. Velocity variations were introduced in the range 1.0–1.5 m/s, which extended the learning time, but the success rate stabilized at >96% after 1500 episodes.
Randomizing both initial position (within a 0.10 m window) and velocity yielded the best asymptotic performance, exceeding 98% after 1,300 episodes. This suggests that a broader state distribution prevented overfitting. These results demonstrate the feasibility of event-driven e-prop and reservoir architectures for low-power predictive control in real-world high-speed robotics. Encoding range scalability tests conducted using 1020 silicon neurons showed a narrow speed range. [0.7, 0.9] m/s achieved more than 97% success within about 150 episodes. medium range [0.7, 1.2] m/s required approximately 700 episodes to achieve similar performance, while the most extensive episode was [0.7, 1.5] m/s, the asymptotic success rate decreased by only 4% from 97% to 93%.
Our data show that, consistent with the finite resolution of fixed-size networks, larger input ranges increase convergence time and slightly degrade performance. Before training, the agent exhibited highly variable and nearly random action selection, but learning transformed this stochastic search into a deterministic and temporally precise strategy. This integration minimized timing differences, enabled robust generalization across different initial conditions, and demonstrated the network’s ability to extract consistent and reliable policies from observations of noisy conditions. This work achieved stable performance within 1500, 2000 episodes using 1020 DYNAP-SE neurons and outperformed simulation-based bioinspired RL using 10,000 neurons.
Demo of neuromorphic air hockey control using 1020 neurons
Scientists have demonstrated successful robot manipulation using a small spiking neural network consisting of just 1020 neurons. This study bridges neuroscience-inspired hardware and real-world robot control, demonstrating the potential of brain-inspired approaches to fast-paced interaction tasks. Notably, this network achieves better performance than simulations with 10 times as many neurons, highlighting the efficiency of neuromorphic hardware. However, the authors acknowledge that there are limitations associated with the fixed size of the silicon network, which currently limits the range of pack speeds that can be reliably handled, with success rates decreasing from 97% to 86% as the speed range increases. Future work could address this issue by utilizing all available processor cores, implementing offline learning mechanisms, integrating event camera inputs to improve latency and robustness, and potentially validating the system on platforms such as iCub under real-world conditions.
👉 More information
🗞 Use spike reinforcement learning to train slow silicon neurons to control ultra-fast robots
🧠ArXiv: https://arxiv.org/abs/2601.21548
