Deep reinforcement learning achieves 9.5% improvement on electrical Dial-A-Ride problem

Machine Learning


Researchers are tackling the complex logistical challenges of on-demand transportation of electricity with a new approach to the Electric Dial-a-Ride Problem (E-DARP). Sten Elling Tingstad Jacobsen, Attila Lischka, and Balázs Kulcsár from Chalmers University of Technology and Anders Lindman from Volvo Cars will introduce deep reinforcement learning techniques that go beyond traditional route optimization techniques. Their work is important because it directly addresses the limitations of existing solutions when applied to real-world scenarios with limited battery capacity and fluctuating energy demands. By learning directly from road network attributes, the team’s policies jointly optimize routing, charging schedules, and quality of service, achieving significant improvements in both solution quality and computational speed compared to established metaheuristic methods such as adaptive large-scale neighborhood search.

This research addresses the increasing demand for electricity on-demand services and the operational challenges of managing fleets under energy and quality of service constraints. In this study, we introduce a new method based on a graph neural network encoder and an attention-driven route construction policy to enable joint optimization of routing, charging, and quality of service. The team’s approach accurately captures non-Euclidean, asymmetric, and energy-dependent path costs found in real-world road networks by directly processing edge attributes such as travel time and energy consumption.

The core innovation lies in the ability of learned policies to navigate E-DARP without relying on traditional Euclidean assumptions or manually designed heuristics. The researchers evaluated the method using San Francisco rideshare data and demonstrated its effectiveness in both benchmark instances and large-scale scenarios. For a standard benchmark problem, the deep reinforcement learning approach achieved a solution within 0.4% of the best known result while reducing computation time by several orders of magnitude. This significant speedup is critical for real-time applications where rapid decision-making is essential.

Further investigations included large-scale instances with up to 250 request pairs, incorporating realistic energy models and nonlinear charging dynamics. This contrasts sharply with the time required for metaheuristics and highlights the potential for significant efficiency gains. The team also conducted sensitivity analyzes to quantify the impact of key parameters such as battery capacity, fleet size, and reward weights, alongside robustness experiments to check the generalizability of the policy under stochastic conditions.

This breakthrough establishes a new paradigm for solving E-DARP and provides a scalable and efficient alternative to traditional optimization techniques. This initiative paves the way for the introduction of intelligent, on-demand electric vehicle fleets that can effectively balance passenger needs, energy consumption, and operating costs. The research team designed a system based on a graph neural network encoder and attention-driven route construction policies to simultaneously optimize routing, billing, and quality of service. This method directly handles edge attributes such as travel time and energy consumption, allowing the acquisition of non-Euclidean, asymmetric, and energy-dependent routing costs in road networks. In our experiments, we used San Francisco rideshare data to evaluate the performance of the learned policies.

This work pioneered the use of graph neural networks to encode problem spaces, allowing models to learn representations of road networks and demand patterns. The researchers then trained an attention-driven policy to build routes by selectively focusing on relevant edges in the graph and effectively prioritizing efficient and energy-aware paths. This approach achieves a solution within 0.4% of the best known results on benchmark instances while significantly reducing computation time. In the second case study, we investigated a large-scale instance with up to 250 request pairs, incorporating a realistic energy model and nonlinear charging dynamics.

The team implemented a detailed energy model that takes into account battery depletion during the trip and changes in charging speed at different charging stations. This allows us to more accurately simulate the behavior of real electric vehicles. Additionally, sensitivity analysis quantified the impact of key parameters such as battery capacity, fleet size, rideshare capacity, and reward weights on overall performance. Robustness experiments demonstrated that deterministically trained policies generalize effectively even under stochastic conditions, highlighting the adaptability of the approach. The team measured performance improvements on rideshare data from San Francisco, achieving a solution within 0.4% of the best known results on benchmark instances, while simultaneously reducing computation time by orders of magnitude. Experiments reveal that the learned policies effectively optimize routing, charging schedules, and quality of service without relying on traditional Euclidean assumptions or manually designed heuristics. This breakthrough represents a significant advance in real-time fleet management capabilities.

In this study, we focused on directly manipulating edge attributes such as travel time and energy consumption to accurately capture the non-Euclidean, asymmetric, and energy-dependent routing costs inherent in real road networks. The data shows that the learned policies successfully navigate these complexities and jointly optimize multiple factors essential for efficient service delivery. The second case study demonstrated even more significant improvements by leveraging large-scale instances with up to 250 request pairs, realistic energy models, and nonlinear charging dynamics. The scientists note that sensitivity analysis quantified the influence of key parameters such as battery capacity, fleet size, rideshare capacity, and reward weights, providing valuable insights into the system’s behavior. Robustness experiments confirm that the deterministically trained policy generalizes effectively under stochastic conditions and ensures reliable performance in dynamic environments. This breakthrough provides a scalable solution for managing electric vehicle fleets in complex urban environments.

Measurements confirm the approach’s ability to handle realistic constraints and optimize performance across different operational scenarios. The efficiency of trained policies comes from their ability to learn directly from data, bypassing the limitations of traditional optimization techniques. Further analysis quantified the impact of different battery capacities and revealed how adjustments in vehicle size and ridesharing capacity affect overall system performance. This new approach leverages encoders and attention-driven route building policies to simultaneously optimize routing, charging, and quality of service. This method effectively captures the non-Euclidean and energy-dependent costs inherent in real-world road networks by directly analyzing edge attributes such as travel time and energy consumption. Sensitivity analyzes highlighted the importance of battery capacity and fleet size, while robustness experiments showed effective generalization under stochastic conditions even when trained on deterministic data. The authors acknowledge the limitations associated with the deterministic demand arrival assumption and suggest that this can have a significant impact on completion rates for sequential replanning scenarios. Future research could explore ways to address this uncertainty more directly. However, the findings establish a valuable framework for fleet planning, demonstrate the potential for efficient and reliable operation of electric vehicle fleets in urban environments, and provide insight into important operational trade-offs regarding resource allocation and quality of service.

👉 More information
🗞 Learning Dial-a-Ride: A deep graph reinforcement learning approach to the Electric Dial-a-Ride problem
🧠ArXiv: https://arxiv.org/abs/2601.22052



Source link