Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

Renzhe Xu, Haotian Wang, Xingxuan Zhang, Bo Li, Peng Cui
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:38674-38706, 2023.

Abstract

Competitions for shareable and limited resources have long been studied with strategic agents. In reality, agents often have to learn and maximize the rewards of the resources at the same time. To design an individualized competing policy, we model the competition between agents in a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. In addition, when several players pull the same arm, we assume that these players averagely share the arms’ rewards by expectation. Under this setting, we first analyze the Nash equilibrium when arms’ rewards are known. Subsequently, we propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We theoretically demonstrate that SMAA could achieve a good regret guarantee for each player when all players follow the algorithm. Additionally, we establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players’ rewards without incurring substantial losses for themselves. We finally validate the effectiveness of the method in extensive synthetic experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-xu23q, title = {Competing for Shareable Arms in Multi-Player Multi-Armed Bandits}, author = {Xu, Renzhe and Wang, Haotian and Zhang, Xingxuan and Li, Bo and Cui, Peng}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {38674--38706}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/xu23q/xu23q.pdf}, url = {https://proceedings.mlr.press/v202/xu23q.html}, abstract = {Competitions for shareable and limited resources have long been studied with strategic agents. In reality, agents often have to learn and maximize the rewards of the resources at the same time. To design an individualized competing policy, we model the competition between agents in a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. In addition, when several players pull the same arm, we assume that these players averagely share the arms’ rewards by expectation. Under this setting, we first analyze the Nash equilibrium when arms’ rewards are known. Subsequently, we propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We theoretically demonstrate that SMAA could achieve a good regret guarantee for each player when all players follow the algorithm. Additionally, we establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players’ rewards without incurring substantial losses for themselves. We finally validate the effectiveness of the method in extensive synthetic experiments.} }
Endnote
%0 Conference Paper %T Competing for Shareable Arms in Multi-Player Multi-Armed Bandits %A Renzhe Xu %A Haotian Wang %A Xingxuan Zhang %A Bo Li %A Peng Cui %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-xu23q %I PMLR %P 38674--38706 %U https://proceedings.mlr.press/v202/xu23q.html %V 202 %X Competitions for shareable and limited resources have long been studied with strategic agents. In reality, agents often have to learn and maximize the rewards of the resources at the same time. To design an individualized competing policy, we model the competition between agents in a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. In addition, when several players pull the same arm, we assume that these players averagely share the arms’ rewards by expectation. Under this setting, we first analyze the Nash equilibrium when arms’ rewards are known. Subsequently, we propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We theoretically demonstrate that SMAA could achieve a good regret guarantee for each player when all players follow the algorithm. Additionally, we establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players’ rewards without incurring substantial losses for themselves. We finally validate the effectiveness of the method in extensive synthetic experiments.
APA
Xu, R., Wang, H., Zhang, X., Li, B. & Cui, P.. (2023). Competing for Shareable Arms in Multi-Player Multi-Armed Bandits. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:38674-38706 Available from https://proceedings.mlr.press/v202/xu23q.html.

Related Material