Robots learn to sculpt sand using reinforcement learning

The study, published at ARXIV, details how researchers from the University of Bonn developed a reinforcement learning framework that allows robots to manipulate granular media such as sand into target shapes. The system trains a robotic arm with a cube end effector and a stereo camera, reconstructing loose material into shapes such as rectangles, L-shaped, polygons, and archaeological fresco negatives. The experiment showed millimeter-level accuracy, with trained agents exceeding two baseline approaches and successfully transferring from simulation to physical robots without additional training.

Granular materials pose difficulties in robotics due to their high-dimensional compositional space and unstable dynamics. Although rules-based approaches often fail, particle simulations are computationally expensive. Researchers addressed these challenges by designing compact observation spaces and reward features that guide learning. Visual policy was trained using a truncated quantile critic (TQC), a decompositionable reinforcement learning algorithm. The depth image of the Zed 2i stereo camera has been converted to a height map, allowing the robot to compare current and target structures in a format suitable for efficient training.

The robot's task is to use cubic end effectors to manipulate granular media to form a shape as close as possible to the desired target configuration. Images via Bonn University.

This system was assessed against random policies and Bustrofedon coverage path planning baselines. Over the 400 target shapes, learning agents always outperformed both methods. Using the delta reward (delta) formulation, the robot achieved an average height difference of 3.4 mm compared to 4.8 mm in planning methods and 7.2 mm in random motion. The execution time was also short, with a mean path planning baseline of 23.5 steps vs. 44. Agents also altered 97% of related cells in the target region, compared to 54% for random movement. The execution step was defined as the number of actions until the end effector left the granular medium in three consecutive steps. Statistical tests confirmed that the delta policy far outweighed all options.

The project involved the Humanoid Robot Lab, Autonomous Intelligent Systems Lab, and the Robotic Science Center at Bonn University, collaborated with the Lamar Institute for Machine Learning and Artificial Intelligence. Funding came from the European Commission's repair program under Horizon 2020 and from the German Federal Ministry of Education and Research through the German Initiative of the Institute of Robotics in Germany.

A training process has been employed to enable agents to manipulate granular media using sensory input. Visual policies are trained through reinforcement learning to achieve target shape configuration based on the differences between the current target height map and the desired height map. Images via Bonn University. — A training process has been employed to enable agents to manipulate granular media using sensory input. Images via Bonn University.

Further experiments examined design options. When the reward for movements in the target area was removed, the agent avoided manipulation behavior completely and performed better than the random baseline. Ablation of the feature extraction device showed that the proposed gating-based encoder achieved the best performance and had an average error of 3.4 mm compared to 4.6 mm when it directly relies on depth images. Comparison of algorithms confirmed that TQC achieved stable convergence, whereas soft actor critic delays and twin deep deterministic policy gradients failed to converge. Additional details, videos and code will be provided to supplemental sites linked to the paper.

Deployment to the UR5E robot arm examined an approach other than simulation. Despite sensor noise and uneven starting surfaces, the robot reproduced target shapes such as rectangles, reproducing results similar to those seen in the simulation. The ability to directly transfer from synthetic training environments to real-world executions demonstrated the robustness of the framework.

From left, a reconstructed 3D scene of the simulation. Images via Bonn University.

Granular media manipulation studies cover excavation, grading, and extraterrestrial soil treatment. Many approaches rely on computationally demanding finite or discrete element simulations, or on imitation learning of pipelines tailored to a particular task. By combining efficient height map representations with carefully designed reward formulations, the Bonn team demonstrated that reinforcement learning can adaptively form granular media without reinforced rules.

The authors conclude that the method is consistently superior to traditional baselines, establishing a viable route for adaptive robotic manipulation of deformable materials.

There is a limited space left AMA: Energy 2025. Sign up now and join us in conversations about the future of energy and additives.

Ready to discover who won 2024 3D Printing Industry Award?

Subscribe to 3D Printing Industry Newsletter And follow us LinkedIn Stay up to date with the latest news and insights.

The featured image shows that agents employ a training process to enable them to manipulate granular media using sensory input. Images via Bonn University.

Source link