Information Complexity in Bandit Subset Selection

Emilie Kaufmann; Shivaram Kalyanakrishnan

Information Complexity in Bandit Subset Selection

Emilie Kaufmann, Shivaram Kalyanakrishnan

Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:228-251, 2013.

Abstract

We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter.

Cite this Paper

BibTeX


@InProceedings{pmlr-v30-Kaufmann13,
  title = 	 {Information Complexity in Bandit Subset Selection},
  author = 	 {Kaufmann, Emilie and Kalyanakrishnan, Shivaram},
  booktitle = 	 {Proceedings of the 26th Annual Conference on Learning Theory},
  pages = 	 {228--251},
  year = 	 {2013},
  editor = 	 {Shalev-Shwartz, Shai and Steinwart, Ingo},
  volume = 	 {30},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Princeton, NJ, USA},
  month = 	 {12--14 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v30/Kaufmann13.pdf},
  url = 	 {https://proceedings.mlr.press/v30/Kaufmann13.html},
  abstract = 	 {We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter.}
}

Endnote

%0 Conference Paper
%T Information Complexity in Bandit Subset Selection
%A Emilie Kaufmann
%A Shivaram Kalyanakrishnan
%B Proceedings of the 26th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2013
%E Shai Shalev-Shwartz
%E Ingo Steinwart	
%F pmlr-v30-Kaufmann13
%I PMLR
%P 228--251
%U https://proceedings.mlr.press/v30/Kaufmann13.html
%V 30
%X We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter.

RIS


TY  - CPAPER
TI  - Information Complexity in Bandit Subset Selection
AU  - Emilie Kaufmann
AU  - Shivaram Kalyanakrishnan
BT  - Proceedings of the 26th Annual Conference on Learning Theory
DA  - 2013/06/13
ED  - Shai Shalev-Shwartz
ED  - Ingo Steinwart	
ID  - pmlr-v30-Kaufmann13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 30
SP  - 228
EP  - 251
L1  - http://proceedings.mlr.press/v30/Kaufmann13.pdf
UR  - https://proceedings.mlr.press/v30/Kaufmann13.html
AB  - We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter.
ER  -

APA


Kaufmann, E. & Kalyanakrishnan, S.. (2013). Information Complexity in Bandit Subset Selection. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:228-251 Available from https://proceedings.mlr.press/v30/Kaufmann13.html.

Related Material

Download PDF