{Multi-Player Bandits Revisited}

Lilian Besson; Emilie Kaufmann

{Multi-Player Bandits Revisited}

Lilian Besson, Emilie Kaufmann

Proceedings of Algorithmic Learning Theory, PMLR 83:56-92, 2018.

Abstract

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that \emph{sensing information} is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, \emph{RandTopM} and \emph{MCTopM}, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called \emph{Selfish}, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

Cite this Paper

BibTeX

@InProceedings{pmlr-v83-besson18a,
  title = 	 {{Multi-Player Bandits Revisited}},
  author = 	 {Besson, Lilian and Kaufmann, Emilie},
  booktitle = 	 {Proceedings of Algorithmic Learning Theory},
  pages = 	 {56--92},
  year = 	 {2018},
  editor = 	 {Janoos, Firdaus and Mohri, Mehryar and Sridharan, Karthik},
  volume = 	 {83},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--09 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v83/besson18a/besson18a.pdf},
  url = 	 {https://proceedings.mlr.press/v83/besson18a.html},
  abstract = 	 {Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that \emph{sensing information} is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, \emph{RandTopM} and \emph{MCTopM}, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called \emph{Selfish}, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.}
}

Endnote

%0 Conference Paper
%T {Multi-Player Bandits Revisited}
%A Lilian Besson
%A Emilie Kaufmann
%B Proceedings of Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Firdaus Janoos
%E Mehryar Mohri
%E Karthik Sridharan	
%F pmlr-v83-besson18a
%I PMLR
%P 56--92
%U https://proceedings.mlr.press/v83/besson18a.html
%V 83
%X Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that \emph{sensing information} is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, \emph{RandTopM} and \emph{MCTopM}, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called \emph{Selfish}, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

APA

Besson, L. & Kaufmann, E.. (2018). {Multi-Player Bandits Revisited}. Proceedings of Algorithmic Learning Theory, in Proceedings of Machine Learning Research 83:56-92 Available from https://proceedings.mlr.press/v83/besson18a.html.

Related Material

Download PDF