Batched Bandit Problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg
; Proceedings of The 28th Conference on Learning Theory, PMLR 40:1456-1456, 2015.

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Cite this Paper


BibTeX
@InProceedings{pmlr-v40-Perchet15, title = {Batched Bandit Problems}, author = {Vianney Perchet and Philippe Rigollet and Sylvain Chassang and Erik Snowberg}, booktitle = {Proceedings of The 28th Conference on Learning Theory}, pages = {1456--1456}, year = {2015}, editor = {Peter Grünwald and Elad Hazan and Satyen Kale}, volume = {40}, series = {Proceedings of Machine Learning Research}, address = {Paris, France}, month = {03--06 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v40/Perchet15.pdf}, url = {http://proceedings.mlr.press/v40/Perchet15.html}, abstract = {Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.} }
Endnote
%0 Conference Paper %T Batched Bandit Problems %A Vianney Perchet %A Philippe Rigollet %A Sylvain Chassang %A Erik Snowberg %B Proceedings of The 28th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2015 %E Peter Grünwald %E Elad Hazan %E Satyen Kale %F pmlr-v40-Perchet15 %I PMLR %J Proceedings of Machine Learning Research %P 1456--1456 %U http://proceedings.mlr.press %V 40 %W PMLR %X Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
RIS
TY - CPAPER TI - Batched Bandit Problems AU - Vianney Perchet AU - Philippe Rigollet AU - Sylvain Chassang AU - Erik Snowberg BT - Proceedings of The 28th Conference on Learning Theory PY - 2015/06/26 DA - 2015/06/26 ED - Peter Grünwald ED - Elad Hazan ED - Satyen Kale ID - pmlr-v40-Perchet15 PB - PMLR SP - 1456 DP - PMLR EP - 1456 L1 - http://proceedings.mlr.press/v40/Perchet15.pdf UR - http://proceedings.mlr.press/v40/Perchet15.html AB - Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits. ER -
APA
Perchet, V., Rigollet, P., Chassang, S. & Snowberg, E.. (2015). Batched Bandit Problems. Proceedings of The 28th Conference on Learning Theory, in PMLR 40:1456-1456

Related Material