Robot vision simulator runs at 2700 frames/second

Scientists are increasingly focusing on developing robust recognition systems for robots operating in complex real-world scenarios. Richeek Das and Pratik Chaudhari from the University of Pennsylvania will introduce Neurosim, a high-performance library designed to simulate the agile dynamics of multirotor vehicles, as well as a variety of sensors such as dynamic vision, RGB, and depth sensors. This study details the design of Neurosim and its integration with Cortex, a ZeroMQ-based communication library that enables frame rates up to 2700 FPS on standard desktop GPUs. The importance of this research lies in its potential to use time-synchronized multimodal data to accelerate the training and closed-loop testing of neuromorphic perception and control algorithms, ultimately advancing the fields of robotics and autonomous systems.

Scientists have created a powerful new tool for rapidly prototyping and testing robot vision systems. This simulator allows engineers to train and validate algorithms at unprecedented speed, bridging the gap between laboratory research and real-world deployment. This advancement is expected to accelerate the development of more agile and sentient robots for a variety of applications.

Scientists have announced Neurosim, a new simulation library that can render complex robotic environments and sensor data in real time at speeds exceeding 2700 frames per second on standard desktop hardware. This work addresses a critical bottleneck in the development and testing of neuromorphic computing and robotics algorithms and provides a path to more robust and efficient systems.

Neurosim does more than simply replicate sensor input. Dynamic vision sensors, RGB cameras, depth sensors, and inertial measurement units meticulously simulate the agile movement of multirotor vehicles in complex and changing environments. The core of the innovation lies in the library’s ability to generate high-fidelity event data, information obtained from event-based cameras that mimic the human retina, without the prohibitive storage demands of traditional simulation methods.

Existing simulators often rely on generating high frame rate RGB data and converting it into event data, consuming terabytes of storage for even short simulation runs. Neurosim bypasses this step and creates event data on the fly, streamlining the process of developing algorithms designed to process these unique data streams. Central to Neurosim’s functionality is its integration with Cortex, a communications library built on ZeroMQ. This facilitates seamless data transfer between simulation and machine learning frameworks such as Python and C++.

Cortex handles high-throughput, low-latency messaging and natively supports NumPy arrays and PyTorch tensors, essential components of modern deep learning pipelines. This combination allows researchers to not only train algorithms using synthetic, time-synchronized multimodal data, but also to rigorously test real-time implementations in closed-loop scenarios.

This research provides a comprehensive solution for simulating the complex sensor inputs encountered by robots operating in dynamic environments. Neurosim enables the creation of realistic training datasets and validation of perception and control algorithms by accurately modeling data from multiple sensors such as LiDAR, RGB cameras, and inertial measurement units at rates comparable to real-world devices.

The simulator’s design prioritizes efficiency, allowing researchers to push the limits of robot agility and explore difficult scenarios such as high-speed maneuvers and obstacle avoidance without risking damage to physical hardware. Neurosim and Cortex are freely available, facilitating broader adoption and accelerating progress in the field of embodied artificial intelligence.

Real-time performance benchmark for high-throughput sensor data processing

Neurosim achieves sustained frame rates of approximately 2700 FPS on a desktop Nvidia 4090 GPU, demonstrating high-performance capabilities for real-time simulation. This rendering speed is facilitated by the integration of Habitat-Sim, which allows a typical indoor scene with multiple RGB-D sensors to be rendered on a single GPU at VGA (640 × 480) resolution at approximately 3000 FPS.

The system simulates an event camera at a rate of several kilohertz by tracking the intensity state of each pixel corresponding to the last triggered event, minimizing the possibility of missed events due to rapid intensity changes. This study demonstrates a high-throughput data pipeline capable of processing approximately 50 million events per second from a monocular high-resolution event camera, in parallel with 1 million pixels per second from an HD RGB camera operating at 100Hz.

In addition, the system processes LiDAR data at approximately 10 Hz, producing approximately 500 million points per second, and records inertial motion data, angular velocity, linear acceleration, and magnetometer orientation at over 500 Hz. This multimodal data stream is larger than 1 GB s-1 and is designed to feed directly into deep learning training pipelines without the need for large disk storage or buffering.

Neurosim’s design prioritizes low-latency communication facilitated by the Cortex library, allowing seamless integration with robotic workflows. The system is able to simulate the reality of real-world robots, including sensor synchronization and separate clock operations, which are important for downstream algorithms. The simulator’s ability to operate significantly faster than real-time enables a viable alternative for a wide range of data curation and facilitates closed-loop perception and control experiments even at the extremes of hardware agility, such as quadrotor flips with angular velocities of approximately 700 °s−1.

Fast simulation enables advanced training for event-based sensors and neuromorphic computing

The constant pursuit of realistic simulation environments has been a bottleneck for robotics and embodied artificial intelligence. Visually impressive simulators like Unreal Engine and CARLA capture photorealism, but are often insufficient to provide the high-fidelity, high-speed data streams needed to train and validate next-generation event-based sensors and neuromorphic algorithms.

Neurosim, along with the Cortex communications library, represents an important step towards closing that gap, prioritizing speed and data synchronization over purely aesthetic fidelity. This doesn’t just mean faster rendering. The ability to generate thousands of frames per second of multimodal sensor data, dynamic vision, RGB, depth, and inertial measurements opens new avenues for self-supervised learning.

Algorithms can now be trained on realistically complex and time-rich data without the delays and limitations of real-world data collection. Importantly, the integration with Cortex facilitates seamless data transfer to machine learning frameworks, streamlining the entire development pipeline. However, focusing on speed inevitably involves trade-offs.

Simulated environments are complex, but may not fully capture the nuances of real-world physics and lighting conditions. Additionally, validating whether algorithms trained in Neurosim can be transferred to real robots remains an important challenge. The advent of tools like Rerun designed to visualize multimodal data is essential for debugging and understanding the differences between simulation and reality. In the future, we can expect a proliferation of specialized simulators, each optimized for a specific sensor modality or robot platform, and an increased emphasis on domain adaptation and simulation-to-realistic transfer techniques.

Source link