An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit

Aldo Pacchiano; Peter Bartlett; Michael Jordan

An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit

Aldo Pacchiano, Peter Bartlett, Michael Jordan

Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:1166-1215, 2023.

Abstract

We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem when the collision reward is unknown. Our results are based on two innovations. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps, up to constant factors, in the absence of collisions. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while preserving meaningful instance-dependent logarithmic regret guarantees.

Cite this Paper

BibTeX


@InProceedings{pmlr-v201-pacchiano23a,
  title = 	 {An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit},
  author =       {Pacchiano, Aldo and Bartlett, Peter and Jordan, Michael},
  booktitle = 	 {Proceedings of The 34th International Conference on Algorithmic Learning Theory},
  pages = 	 {1166--1215},
  year = 	 {2023},
  editor = 	 {Agrawal, Shipra and Orabona, Francesco},
  volume = 	 {201},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {20 Feb--23 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v201/pacchiano23a/pacchiano23a.pdf},
  url = 	 {https://proceedings.mlr.press/v201/pacchiano23a.html},
  abstract = 	 {We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem when the collision reward is unknown. Our results are based on two innovations. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps, up to constant factors, in the absence of collisions. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while  preserving meaningful instance-dependent logarithmic regret guarantees. }
}

Endnote

%0 Conference Paper
%T An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit
%A Aldo Pacchiano
%A Peter Bartlett
%A Michael Jordan
%B Proceedings of The 34th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Shipra Agrawal
%E Francesco Orabona	
%F pmlr-v201-pacchiano23a
%I PMLR
%P 1166--1215
%U https://proceedings.mlr.press/v201/pacchiano23a.html
%V 201
%X We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem when the collision reward is unknown. Our results are based on two innovations. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps, up to constant factors, in the absence of collisions. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while  preserving meaningful instance-dependent logarithmic regret guarantees.

APA


Pacchiano, A., Bartlett, P. & Jordan, M.. (2023). An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit. Proceedings of The 34th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 201:1166-1215 Available from https://proceedings.mlr.press/v201/pacchiano23a.html.

Related Material

Download PDF