Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen Mcaleer, Kagan Tumer
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:6651-6660, 2020.

Abstract

Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-majumdar20a, title = {Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination}, author = {Majumdar, Somdeb and Khadka, Shauharda and Miret, Santiago and Mcaleer, Stephen and Tumer, Kagan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {6651--6660}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/majumdar20a/majumdar20a.pdf}, url = {https://proceedings.mlr.press/v119/majumdar20a.html}, abstract = {Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.} }
Endnote
%0 Conference Paper %T Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination %A Somdeb Majumdar %A Shauharda Khadka %A Santiago Miret %A Stephen Mcaleer %A Kagan Tumer %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-majumdar20a %I PMLR %P 6651--6660 %U https://proceedings.mlr.press/v119/majumdar20a.html %V 119 %X Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.
APA
Majumdar, S., Khadka, S., Miret, S., Mcaleer, S. & Tumer, K.. (2020). Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:6651-6660 Available from https://proceedings.mlr.press/v119/majumdar20a.html.

Related Material