Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

Wei Huang; Richard Combes; Cindy Trinh

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

Wei Huang, Richard Combes, Cindy Trinh

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:1990-2012, 2022.

Abstract

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-huang22a,
  title = 	 {Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information},
  author =       {Huang, Wei and Combes, Richard and Trinh, Cindy},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {1990--2012},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/huang22a/huang22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/huang22a.html},
  abstract = 	 {We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice.}
}

Endnote

%0 Conference Paper
%T Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information
%A Wei Huang
%A Richard Combes
%A Cindy Trinh
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-huang22a
%I PMLR
%P 1990--2012
%U https://proceedings.mlr.press/v178/huang22a.html
%V 178
%X We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice.

APA


Huang, W., Combes, R. & Trinh, C.. (2022). Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:1990-2012 Available from https://proceedings.mlr.press/v178/huang22a.html.

Related Material

Download PDF