Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems

Heiko Hoppe, Tobias Enders, Quentin Cappart, Maximilian Schiffer
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:260-272, 2024.

Abstract

We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-hoppe24a, title = {Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems}, author = {Hoppe, Heiko and Enders, Tobias and Cappart, Quentin and Schiffer, Maximilian}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {260--272}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/hoppe24a/hoppe24a.pdf}, url = {https://proceedings.mlr.press/v242/hoppe24a.html}, abstract = {We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.} }
Endnote
%0 Conference Paper %T Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems %A Heiko Hoppe %A Tobias Enders %A Quentin Cappart %A Maximilian Schiffer %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-hoppe24a %I PMLR %P 260--272 %U https://proceedings.mlr.press/v242/hoppe24a.html %V 242 %X We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
APA
Hoppe, H., Enders, T., Cappart, Q. & Schiffer, M.. (2024). Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:260-272 Available from https://proceedings.mlr.press/v242/hoppe24a.html.

Related Material