Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Yuchen Xiao, Joshua Hoffman, Christopher Amato
Proceedings of the Conference on Robot Learning, PMLR 100:1146-1161, 2020.

Abstract

In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-xiao20a, title = {Macro-Action-Based Deep Multi-Agent Reinforcement Learning}, author = {Xiao, Yuchen and Hoffman, Joshua and Amato, Christopher}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {1146--1161}, year = {2020}, editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, volume = {100}, series = {Proceedings of Machine Learning Research}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/xiao20a/xiao20a.pdf}, url = {https://proceedings.mlr.press/v100/xiao20a.html}, abstract = {In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.} }
Endnote
%0 Conference Paper %T Macro-Action-Based Deep Multi-Agent Reinforcement Learning %A Yuchen Xiao %A Joshua Hoffman %A Christopher Amato %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-xiao20a %I PMLR %P 1146--1161 %U https://proceedings.mlr.press/v100/xiao20a.html %V 100 %X In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.
APA
Xiao, Y., Hoffman, J. & Amato, C.. (2020). Macro-Action-Based Deep Multi-Agent Reinforcement Learning. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:1146-1161 Available from https://proceedings.mlr.press/v100/xiao20a.html.

Related Material