lil’ UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Kevin Jamieson; Matthew Malloy; Robert Nowak; Sébastien Bubeck

lil’ UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Kevin Jamieson, Matthew Malloy, Robert Nowak, Sébastien Bubeck

Proceedings of The 27th Conference on Learning Theory, PMLR 35:423-439, 2014.

Abstract

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

Cite this Paper

BibTeX

@InProceedings{pmlr-v35-jamieson14,
  title = 	 {lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits},
  author = 	 {Jamieson, Kevin and Malloy, Matthew and Nowak, Robert and Bubeck, Sébastien},
  booktitle = 	 {Proceedings of The 27th Conference on Learning Theory},
  pages = 	 {423--439},
  year = 	 {2014},
  editor = 	 {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba},
  volume = 	 {35},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {13--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v35/jamieson14.pdf},
  url = 	 {https://proceedings.mlr.press/v35/jamieson14.html},
  abstract = 	 {The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples.  The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art. }
}

Endnote

%0 Conference Paper
%T lil’ UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
%A Kevin Jamieson
%A Matthew Malloy
%A Robert Nowak
%A Sébastien Bubeck
%B Proceedings of The 27th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2014
%E Maria Florina Balcan
%E Vitaly Feldman
%E Csaba Szepesvári	
%F pmlr-v35-jamieson14
%I PMLR
%P 423--439
%U https://proceedings.mlr.press/v35/jamieson14.html
%V 35
%X The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples.  The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

RIS

TY  - CPAPER
TI  - lil’ UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
AU  - Kevin Jamieson
AU  - Matthew Malloy
AU  - Robert Nowak
AU  - Sébastien Bubeck
BT  - Proceedings of The 27th Conference on Learning Theory
DA  - 2014/05/29
ED  - Maria Florina Balcan
ED  - Vitaly Feldman
ED  - Csaba Szepesvári	
ID  - pmlr-v35-jamieson14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 35
SP  - 423
EP  - 439
L1  - http://proceedings.mlr.press/v35/jamieson14.pdf
UR  - https://proceedings.mlr.press/v35/jamieson14.html
AB  - The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples.  The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art. 
ER  -

APA

Jamieson, K., Malloy, M., Nowak, R. & Bubeck, S.. (2014). lil’ UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:423-439 Available from https://proceedings.mlr.press/v35/jamieson14.html.

Related Material

Download PDF