Bounded regret in stochastic multi-armed bandits

Sébastien Bubeck, Vianney Perchet, Philippe Rigollet
Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:122-134, 2013.

Abstract

We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).

Cite this Paper


BibTeX
@InProceedings{pmlr-v30-Bubeck13, title = {Bounded regret in stochastic multi-armed bandits}, author = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, booktitle = {Proceedings of the 26th Annual Conference on Learning Theory}, pages = {122--134}, year = {2013}, editor = {Shalev-Shwartz, Shai and Steinwart, Ingo}, volume = {30}, series = {Proceedings of Machine Learning Research}, address = {Princeton, NJ, USA}, month = {12--14 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v30/Bubeck13.pdf}, url = {https://proceedings.mlr.press/v30/Bubeck13.html}, abstract = {We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).} }
Endnote
%0 Conference Paper %T Bounded regret in stochastic multi-armed bandits %A Sébastien Bubeck %A Vianney Perchet %A Philippe Rigollet %B Proceedings of the 26th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2013 %E Shai Shalev-Shwartz %E Ingo Steinwart %F pmlr-v30-Bubeck13 %I PMLR %P 122--134 %U https://proceedings.mlr.press/v30/Bubeck13.html %V 30 %X We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).
RIS
TY - CPAPER TI - Bounded regret in stochastic multi-armed bandits AU - Sébastien Bubeck AU - Vianney Perchet AU - Philippe Rigollet BT - Proceedings of the 26th Annual Conference on Learning Theory DA - 2013/06/13 ED - Shai Shalev-Shwartz ED - Ingo Steinwart ID - pmlr-v30-Bubeck13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 30 SP - 122 EP - 134 L1 - http://proceedings.mlr.press/v30/Bubeck13.pdf UR - https://proceedings.mlr.press/v30/Bubeck13.html AB - We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆). ER -
APA
Bubeck, S., Perchet, V. & Rigollet, P.. (2013). Bounded regret in stochastic multi-armed bandits. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:122-134 Available from https://proceedings.mlr.press/v30/Bubeck13.html.

Related Material