Optimal Algorithms for Multiplayer MultiArmed Bandits
[edit]
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:41204129, 2020.
Abstract
The paper addresses various Multiplayer MultiArmed Bandit (MMAB) problems, where M decisionmakers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the stateoftheart algorithm SICMMAB Boursier and Perchet (2019), and yet offers better performance guarantees. We then study the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbours in the graph. We present DPE2, a simple and asymptotically optimal algorithm that outperforms the stateoftheart algorithm DD UCB MartinezRubio et al. (2019). Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.
Related Material


