Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Yuchen Xiao; Joshua Hoffman; Christopher Amato

Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Yuchen Xiao, Joshua Hoffman, Christopher Amato

Proceedings of the Conference on Robot Learning, PMLR 100:1146-1161, 2020.

Abstract

In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.

Cite this Paper

BibTeX


@InProceedings{pmlr-v100-xiao20a,
  title = 	 {Macro-Action-Based Deep Multi-Agent Reinforcement Learning},
  author =       {Xiao, Yuchen and Hoffman, Joshua and Amato, Christopher},
  booktitle = 	 {Proceedings of the Conference on Robot Learning},
  pages = 	 {1146--1161},
  year = 	 {2020},
  editor = 	 {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei},
  volume = 	 {100},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Oct--01 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v100/xiao20a/xiao20a.pdf},
  url = 	 {https://proceedings.mlr.press/v100/xiao20a.html},
  abstract = 	 {In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.}
}

Endnote

%0 Conference Paper
%T Macro-Action-Based Deep Multi-Agent Reinforcement Learning
%A Yuchen Xiao
%A Joshua Hoffman
%A Christopher Amato
%B Proceedings of the Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Leslie Pack Kaelbling
%E Danica Kragic
%E Komei Sugiura	
%F pmlr-v100-xiao20a
%I PMLR
%P 1146--1161
%U https://proceedings.mlr.press/v100/xiao20a.html
%V 100
%X In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.

APA


Xiao, Y., Hoffman, J. & Amato, C.. (2020). Macro-Action-Based Deep Multi-Agent Reinforcement Learning. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:1146-1161 Available from https://proceedings.mlr.press/v100/xiao20a.html.

Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Abstract

Cite this Paper

Related Material