A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-Ambrym Maillard; Rémi Munos; Gilles Stoltz

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz

Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:497-514, 2011.

Abstract

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

Cite this Paper

BibTeX


@InProceedings{pmlr-v19-maillard11a,
  title = 	 {A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences},
  author = 	 {Maillard, Odalric-Ambrym and Munos, Rémi and Stoltz, Gilles},
  booktitle = 	 {Proceedings of the 24th Annual Conference on Learning Theory},
  pages = 	 {497--514},
  year = 	 {2011},
  editor = 	 {Kakade, Sham M. and von Luxburg, Ulrike},
  volume = 	 {19},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Budapest, Hungary},
  month = 	 {09--11 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v19/maillard11a/maillard11a.pdf},
  url = 	 {https://proceedings.mlr.press/v19/maillard11a.html},
  abstract = 	 {We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).}
}

Endnote

%0 Conference Paper
%T A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
%A Odalric-Ambrym Maillard
%A Rémi Munos
%A Gilles Stoltz
%B Proceedings of the 24th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2011
%E Sham M. Kakade
%E Ulrike von Luxburg	
%F pmlr-v19-maillard11a
%I PMLR
%P 497--514
%U https://proceedings.mlr.press/v19/maillard11a.html
%V 19
%X We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

RIS


TY  - CPAPER
TI  - A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
AU  - Odalric-Ambrym Maillard
AU  - Rémi Munos
AU  - Gilles Stoltz
BT  - Proceedings of the 24th Annual Conference on Learning Theory
DA  - 2011/12/21
ED  - Sham M. Kakade
ED  - Ulrike von Luxburg	
ID  - pmlr-v19-maillard11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 19
SP  - 497
EP  - 514
L1  - http://proceedings.mlr.press/v19/maillard11a/maillard11a.pdf
UR  - https://proceedings.mlr.press/v19/maillard11a.html
AB  - We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).
ER  -

APA


Maillard, O., Munos, R. & Stoltz, G.. (2011). A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:497-514 Available from https://proceedings.mlr.press/v19/maillard11a.html.

Related Material

Download PDF