A minimax and asymptotically optimal algorithm for stochastic bandits

Pierre Ménard, Aurélien Garivier
Proceedings of the 28th International Conference on Algorithmic Learning Theory, PMLR 76:223-237, 2017.

Abstract

We propose the $\text{kl-UCB}^{++}$ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v76-ménard17a, title = {A minimax and asymptotically optimal algorithm for stochastic bandits}, author = {Ménard, Pierre and Garivier, Aurélien}, booktitle = {Proceedings of the 28th International Conference on Algorithmic Learning Theory}, pages = {223--237}, year = {2017}, editor = {Hanneke, Steve and Reyzin, Lev}, volume = {76}, series = {Proceedings of Machine Learning Research}, month = {15--17 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v76/ménard17a/ménard17a.pdf}, url = {https://proceedings.mlr.press/v76/m%C3%A9nard17a.html}, abstract = {We propose the $\text{kl-UCB}^{++}$ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.} }
Endnote
%0 Conference Paper %T A minimax and asymptotically optimal algorithm for stochastic bandits %A Pierre Ménard %A Aurélien Garivier %B Proceedings of the 28th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Steve Hanneke %E Lev Reyzin %F pmlr-v76-ménard17a %I PMLR %P 223--237 %U https://proceedings.mlr.press/v76/m%C3%A9nard17a.html %V 76 %X We propose the $\text{kl-UCB}^{++}$ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.
APA
Ménard, P. & Garivier, A.. (2017). A minimax and asymptotically optimal algorithm for stochastic bandits. Proceedings of the 28th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 76:223-237 Available from https://proceedings.mlr.press/v76/m%C3%A9nard17a.html.

Related Material