Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

Wei Huang, Richard Combes, Cindy Trinh
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:1990-2012, 2022.

Abstract

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-huang22a, title = {Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information}, author = {Huang, Wei and Combes, Richard and Trinh, Cindy}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {1990--2012}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/huang22a/huang22a.pdf}, url = {https://proceedings.mlr.press/v178/huang22a.html}, abstract = {We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice.} }
Endnote
%0 Conference Paper %T Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information %A Wei Huang %A Richard Combes %A Cindy Trinh %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-huang22a %I PMLR %P 1990--2012 %U https://proceedings.mlr.press/v178/huang22a.html %V 178 %X We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice.
APA
Huang, W., Combes, R. & Trinh, C.. (2022). Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:1990-2012 Available from https://proceedings.mlr.press/v178/huang22a.html.

Related Material