Almost Optimal Exploration in Multi-Armed Bandits

Zohar Karnin; Tomer Koren; Oren Somekh

Almost Optimal Exploration in Multi-Armed Bandits

Zohar Karnin, Tomer Koren, Oren Somekh

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1238-1246, 2013.

Abstract

We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.

Cite this Paper

BibTeX

@InProceedings{pmlr-v28-karnin13,
  title = 	 {Almost Optimal Exploration in Multi-Armed Bandits},
  author = 	 {Karnin, Zohar and Koren, Tomer and Somekh, Oren},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {1238--1246},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/karnin13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/karnin13.html},
  abstract = 	 {We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.  }
}

Endnote

%0 Conference Paper
%T Almost Optimal Exploration in Multi-Armed Bandits
%A Zohar Karnin
%A Tomer Koren
%A Oren Somekh
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-karnin13
%I PMLR
%P 1238--1246
%U https://proceedings.mlr.press/v28/karnin13.html
%V 28
%N 3
%X We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.

RIS

TY  - CPAPER
TI  - Almost Optimal Exploration in Multi-Armed Bandits
AU  - Zohar Karnin
AU  - Tomer Koren
AU  - Oren Somekh
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-karnin13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 1238
EP  - 1246
L1  - http://proceedings.mlr.press/v28/karnin13.pdf
UR  - https://proceedings.mlr.press/v28/karnin13.html
AB  - We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.  
ER  -

APA

Karnin, Z., Koren, T. & Somekh, O.. (2013). Almost Optimal Exploration in Multi-Armed Bandits. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1238-1246 Available from https://proceedings.mlr.press/v28/karnin13.html.

Related Material

Download PDF