Information Complexity in Bandit Subset Selection

Emilie Kaufmann, Shivaram Kalyanakrishnan
Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:228-251, 2013.

Abstract

We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter.

Cite this Paper


BibTeX
@InProceedings{pmlr-v30-Kaufmann13, title = {Information Complexity in Bandit Subset Selection}, author = {Kaufmann, Emilie and Kalyanakrishnan, Shivaram}, booktitle = {Proceedings of the 26th Annual Conference on Learning Theory}, pages = {228--251}, year = {2013}, editor = {Shalev-Shwartz, Shai and Steinwart, Ingo}, volume = {30}, series = {Proceedings of Machine Learning Research}, address = {Princeton, NJ, USA}, month = {12--14 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v30/Kaufmann13.pdf}, url = {https://proceedings.mlr.press/v30/Kaufmann13.html}, abstract = {We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter.} }
Endnote
%0 Conference Paper %T Information Complexity in Bandit Subset Selection %A Emilie Kaufmann %A Shivaram Kalyanakrishnan %B Proceedings of the 26th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2013 %E Shai Shalev-Shwartz %E Ingo Steinwart %F pmlr-v30-Kaufmann13 %I PMLR %P 228--251 %U https://proceedings.mlr.press/v30/Kaufmann13.html %V 30 %X We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter.
RIS
TY - CPAPER TI - Information Complexity in Bandit Subset Selection AU - Emilie Kaufmann AU - Shivaram Kalyanakrishnan BT - Proceedings of the 26th Annual Conference on Learning Theory DA - 2013/06/13 ED - Shai Shalev-Shwartz ED - Ingo Steinwart ID - pmlr-v30-Kaufmann13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 30 SP - 228 EP - 251 L1 - http://proceedings.mlr.press/v30/Kaufmann13.pdf UR - https://proceedings.mlr.press/v30/Kaufmann13.html AB - We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter. ER -
APA
Kaufmann, E. & Kalyanakrishnan, S.. (2013). Information Complexity in Bandit Subset Selection. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:228-251 Available from https://proceedings.mlr.press/v30/Kaufmann13.html.

Related Material