Optimal Algorithms for Multiplayer Multi-Armed Bandits
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4120-4129, 2020.
The paper addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB Boursier and Perchet (2019), and yet offers better performance guarantees. We then study the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbours in the graph. We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD- UCB Martinez-Rubio et al. (2019). Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.