Batched Bandit Problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg
Proceedings of The 28th Conference on Learning Theory, PMLR 40:1456-1456, 2015.

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Cite this Paper


BibTeX
@InProceedings{pmlr-v40-Perchet15, title = {Batched Bandit Problems}, author = {Perchet, Vianney and Rigollet, Philippe and Chassang, Sylvain and Snowberg, Erik}, booktitle = {Proceedings of The 28th Conference on Learning Theory}, pages = {1456--1456}, year = {2015}, editor = {Grünwald, Peter and Hazan, Elad and Kale, Satyen}, volume = {40}, series = {Proceedings of Machine Learning Research}, address = {Paris, France}, month = {03--06 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v40/Perchet15.pdf}, url = {https://proceedings.mlr.press/v40/Perchet15.html}, abstract = {Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.} }
Endnote
%0 Conference Paper %T Batched Bandit Problems %A Vianney Perchet %A Philippe Rigollet %A Sylvain Chassang %A Erik Snowberg %B Proceedings of The 28th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2015 %E Peter Grünwald %E Elad Hazan %E Satyen Kale %F pmlr-v40-Perchet15 %I PMLR %P 1456--1456 %U https://proceedings.mlr.press/v40/Perchet15.html %V 40 %X Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
RIS
TY - CPAPER TI - Batched Bandit Problems AU - Vianney Perchet AU - Philippe Rigollet AU - Sylvain Chassang AU - Erik Snowberg BT - Proceedings of The 28th Conference on Learning Theory DA - 2015/06/26 ED - Peter Grünwald ED - Elad Hazan ED - Satyen Kale ID - pmlr-v40-Perchet15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 40 SP - 1456 EP - 1456 L1 - http://proceedings.mlr.press/v40/Perchet15.pdf UR - https://proceedings.mlr.press/v40/Perchet15.html AB - Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits. ER -
APA
Perchet, V., Rigollet, P., Chassang, S. & Snowberg, E.. (2015). Batched Bandit Problems. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:1456-1456 Available from https://proceedings.mlr.press/v40/Perchet15.html.

Related Material