Optimal Algorithms for Multiplayer Multi-Armed Bandits

PO-AN WANG, Alexandre Proutiere, Kaito Ariu, Yassir Jedra, Alessio Russo
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4120-4129, 2020.

Abstract

The paper addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB Boursier and Perchet (2019), and yet offers better performance guarantees. We then study the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbours in the graph. We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD- UCB Martinez-Rubio et al. (2019). Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-wang20m, title = {Optimal Algorithms for Multiplayer Multi-Armed Bandits}, author = {WANG, PO-AN and Proutiere, Alexandre and Ariu, Kaito and Jedra, Yassir and Russo, Alessio}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4120--4129}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/wang20m/wang20m.pdf}, url = {https://proceedings.mlr.press/v108/wang20m.html}, abstract = {The paper addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB Boursier and Perchet (2019), and yet offers better performance guarantees. We then study the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbours in the graph. We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD- UCB Martinez-Rubio et al. (2019). Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.} }
Endnote
%0 Conference Paper %T Optimal Algorithms for Multiplayer Multi-Armed Bandits %A PO-AN WANG %A Alexandre Proutiere %A Kaito Ariu %A Yassir Jedra %A Alessio Russo %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-wang20m %I PMLR %P 4120--4129 %U https://proceedings.mlr.press/v108/wang20m.html %V 108 %X The paper addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB Boursier and Perchet (2019), and yet offers better performance guarantees. We then study the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbours in the graph. We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD- UCB Martinez-Rubio et al. (2019). Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.
APA
WANG, P., Proutiere, A., Ariu, K., Jedra, Y. & Russo, A.. (2020). Optimal Algorithms for Multiplayer Multi-Armed Bandits. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4120-4129 Available from https://proceedings.mlr.press/v108/wang20m.html.

Related Material