Collaborative Exploration in Stochastic Multi-Player Bandits
Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:193-208, 2020.
Internet of Things (IoT) faces multiple challenges to achieve high reliability, low-latency and low power consumption. Its performance is affected by many factors such as external interference coming from other coexisting wireless communication technologies that are sharing the same spectrum. To address this problem, we introduce a general approach for the identification of poor-link quality channels. We formulate our problem as a multi-player multi-armed bandit problem, where the devices in an IoT network are the players, and the arms are the radio channels. For a realistic formulation, we do not assume that sensing information is available or that the number of players is below the number of arms. We develop and analyze a collaborative decentralized algorithm that aims to find a set of $m$ $(\epsilon,m)$-optimal arms using an Explore-$m$ algorithm (as denoted by Kalyanakrishnan and Stone (2010)) as a subroutine, and hence blacklisting the suboptimal arms in order to improve the QoS of IoT networks while reducing their energy consumption. We prove analytically and experimentally that our algorithm outperforms selfish algorithms in terms of sample complexity with a low communication cost, and that although playing a smaller set of arms increases the collision rate, playing the optimal arms only improves the QoS of the network.