Multi-Player Bandits – a Musical Chairs Approach

Jonathan Rosenski, Ohad Shamir, Liran Szlak
; Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:155-163, 2016.

Abstract

We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-rosenski16, title = {Multi-Player Bandits -- a Musical Chairs Approach}, author = {Jonathan Rosenski and Ohad Shamir and Liran Szlak}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {155--163}, year = {2016}, editor = {Maria Florina Balcan and Kilian Q. Weinberger}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/rosenski16.pdf}, url = {http://proceedings.mlr.press/v48/rosenski16.html}, abstract = {We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees.} }
Endnote
%0 Conference Paper %T Multi-Player Bandits – a Musical Chairs Approach %A Jonathan Rosenski %A Ohad Shamir %A Liran Szlak %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-rosenski16 %I PMLR %J Proceedings of Machine Learning Research %P 155--163 %U http://proceedings.mlr.press %V 48 %W PMLR %X We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees.
RIS
TY - CPAPER TI - Multi-Player Bandits – a Musical Chairs Approach AU - Jonathan Rosenski AU - Ohad Shamir AU - Liran Szlak BT - Proceedings of The 33rd International Conference on Machine Learning PY - 2016/06/11 DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-rosenski16 PB - PMLR SP - 155 DP - PMLR EP - 163 L1 - http://proceedings.mlr.press/v48/rosenski16.pdf UR - http://proceedings.mlr.press/v48/rosenski16.html AB - We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees. ER -
APA
Rosenski, J., Shamir, O. & Szlak, L.. (2016). Multi-Player Bandits – a Musical Chairs Approach. Proceedings of The 33rd International Conference on Machine Learning, in PMLR 48:155-163

Related Material