Coordinated Exploration in Concurrent Reinforcement Learning

Maria Dimakopoulou; Benjamin Van Roy

Coordinated Exploration in Concurrent Reinforcement Learning

Maria Dimakopoulou, Benjamin Van Roy

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1271-1279, 2018.

Abstract

We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-dimakopoulou18a,
  title = 	 {Coordinated Exploration in Concurrent Reinforcement Learning},
  author =       {Dimakopoulou, Maria and Van Roy, Benjamin},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {1271--1279},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/dimakopoulou18a/dimakopoulou18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/dimakopoulou18a.html},
  abstract = 	 {We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes.}
}

Endnote

%0 Conference Paper
%T Coordinated Exploration in Concurrent Reinforcement Learning
%A Maria Dimakopoulou
%A Benjamin Van Roy
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-dimakopoulou18a
%I PMLR
%P 1271--1279
%U https://proceedings.mlr.press/v80/dimakopoulou18a.html
%V 80
%X We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes.

APA

Dimakopoulou, M. & Van Roy, B.. (2018). Coordinated Exploration in Concurrent Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1271-1279 Available from https://proceedings.mlr.press/v80/dimakopoulou18a.html.

Related Material

Download PDF