Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

Somdeb Majumdar; Shauharda Khadka; Santiago Miret; Stephen Mcaleer; Kagan Tumer

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen Mcaleer, Kagan Tumer

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:6651-6660, 2020.

Abstract

Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-majumdar20a,
  title = 	 {Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination},
  author =       {Majumdar, Somdeb and Khadka, Shauharda and Miret, Santiago and Mcaleer, Stephen and Tumer, Kagan},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {6651--6660},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/majumdar20a/majumdar20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/majumdar20a.html},
  abstract = 	 {Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.}
}

Endnote

%0 Conference Paper
%T Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
%A Somdeb Majumdar
%A Shauharda Khadka
%A Santiago Miret
%A Stephen Mcaleer
%A Kagan Tumer
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-majumdar20a
%I PMLR
%P 6651--6660
%U https://proceedings.mlr.press/v119/majumdar20a.html
%V 119
%X Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.

APA


Majumdar, S., Khadka, S., Miret, S., Mcaleer, S. & Tumer, K.. (2020). Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:6651-6660 Available from https://proceedings.mlr.press/v119/majumdar20a.html.

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

Abstract

Cite this Paper

Related Material