Bounded regret in stochastic multi-armed bandits

Sébastien Bubeck; Vianney Perchet; Philippe Rigollet

Bounded regret in stochastic multi-armed bandits

Sébastien Bubeck, Vianney Perchet, Philippe Rigollet

Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:122-134, 2013.

Abstract

We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).

Cite this Paper

BibTeX


@InProceedings{pmlr-v30-Bubeck13,
  title = 	 {Bounded regret in stochastic multi-armed bandits},
  author = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  booktitle = 	 {Proceedings of the 26th Annual Conference on Learning Theory},
  pages = 	 {122--134},
  year = 	 {2013},
  editor = 	 {Shalev-Shwartz, Shai and Steinwart, Ingo},
  volume = 	 {30},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Princeton, NJ, USA},
  month = 	 {12--14 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v30/Bubeck13.pdf},
  url = 	 {https://proceedings.mlr.press/v30/Bubeck13.html},
  abstract = 	 {We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).}
}

Endnote

%0 Conference Paper
%T Bounded regret in stochastic multi-armed bandits
%A Sébastien Bubeck
%A Vianney Perchet
%A Philippe Rigollet
%B Proceedings of the 26th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2013
%E Shai Shalev-Shwartz
%E Ingo Steinwart	
%F pmlr-v30-Bubeck13
%I PMLR
%P 122--134
%U https://proceedings.mlr.press/v30/Bubeck13.html
%V 30
%X We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).

RIS


TY  - CPAPER
TI  - Bounded regret in stochastic multi-armed bandits
AU  - Sébastien Bubeck
AU  - Vianney Perchet
AU  - Philippe Rigollet
BT  - Proceedings of the 26th Annual Conference on Learning Theory
DA  - 2013/06/13
ED  - Shai Shalev-Shwartz
ED  - Ingo Steinwart	
ID  - pmlr-v30-Bubeck13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 30
SP  - 122
EP  - 134
L1  - http://proceedings.mlr.press/v30/Bubeck13.pdf
UR  - https://proceedings.mlr.press/v30/Bubeck13.html
AB  - We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).
ER  -

APA


Bubeck, S., Perchet, V. & Rigollet, P.. (2013). Bounded regret in stochastic multi-armed bandits. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:122-134 Available from https://proceedings.mlr.press/v30/Bubeck13.html.

Related Material

Download PDF