Coordinated Exploration in Concurrent Reinforcement Learning

Maria Dimakopoulou, Benjamin Van Roy
; Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1271-1279, 2018.

Abstract

We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-dimakopoulou18a, title = {Coordinated Exploration in Concurrent Reinforcement Learning}, author = {Dimakopoulou, Maria and Van Roy, Benjamin}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1271--1279}, year = {2018}, editor = {Jennifer Dy and Andreas Krause}, volume = {80}, series = {Proceedings of Machine Learning Research}, address = {Stockholmsmässan, Stockholm Sweden}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/dimakopoulou18a/dimakopoulou18a.pdf}, url = {http://proceedings.mlr.press/v80/dimakopoulou18a.html}, abstract = {We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes.} }
Endnote
%0 Conference Paper %T Coordinated Exploration in Concurrent Reinforcement Learning %A Maria Dimakopoulou %A Benjamin Van Roy %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-dimakopoulou18a %I PMLR %J Proceedings of Machine Learning Research %P 1271--1279 %U http://proceedings.mlr.press %V 80 %W PMLR %X We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes.
APA
Dimakopoulou, M. & Van Roy, B.. (2018). Coordinated Exploration in Concurrent Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:1271-1279

Related Material