A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

Abbas Mehrabian, Etienne Boursier, Emilie Kaufmann, Vianney Perchet
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1211-1221, 2020.

Abstract

We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new and efficient algorithm that combines the idea of leveraging forced collisions for implicit communication and that of performing matching eliminations. We present a finite-time analysis of our algorithm, giving the first sublinear minimax regret bound for this problem, and prove that if the optimal assignment of players to arms is unique, our algorithm attains the optimal O(ln(T)) regret, solving an open question raised at NeurIPS 2018 by Bistritz and Leshem (2018).

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-mehrabian20a, title = {A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players}, author = {Mehrabian, Abbas and Boursier, Etienne and Kaufmann, Emilie and Perchet, Vianney}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {1211--1221}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/mehrabian20a/mehrabian20a.pdf}, url = {https://proceedings.mlr.press/v108/mehrabian20a.html}, abstract = {We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new and efficient algorithm that combines the idea of leveraging forced collisions for implicit communication and that of performing matching eliminations. We present a finite-time analysis of our algorithm, giving the first sublinear minimax regret bound for this problem, and prove that if the optimal assignment of players to arms is unique, our algorithm attains the optimal O(ln(T)) regret, solving an open question raised at NeurIPS 2018 by Bistritz and Leshem (2018).} }
Endnote
%0 Conference Paper %T A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players %A Abbas Mehrabian %A Etienne Boursier %A Emilie Kaufmann %A Vianney Perchet %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-mehrabian20a %I PMLR %P 1211--1221 %U https://proceedings.mlr.press/v108/mehrabian20a.html %V 108 %X We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new and efficient algorithm that combines the idea of leveraging forced collisions for implicit communication and that of performing matching eliminations. We present a finite-time analysis of our algorithm, giving the first sublinear minimax regret bound for this problem, and prove that if the optimal assignment of players to arms is unique, our algorithm attains the optimal O(ln(T)) regret, solving an open question raised at NeurIPS 2018 by Bistritz and Leshem (2018).
APA
Mehrabian, A., Boursier, E., Kaufmann, E. & Perchet, V.. (2020). A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1211-1221 Available from https://proceedings.mlr.press/v108/mehrabian20a.html.

Related Material