Batched Bandit Problems

Vianney Perchet; Philippe Rigollet; Sylvain Chassang; Erik Snowberg

Batched Bandit Problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg

Proceedings of The 28th Conference on Learning Theory, PMLR 40:1456-1456, 2015.

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Cite this Paper

BibTeX


@InProceedings{pmlr-v40-Perchet15,
  title = 	 {Batched Bandit Problems},
  author = 	 {Perchet, Vianney and Rigollet, Philippe and Chassang, Sylvain and Snowberg, Erik},
  booktitle = 	 {Proceedings of The 28th Conference on Learning Theory},
  pages = 	 {1456--1456},
  year = 	 {2015},
  editor = 	 {Grünwald, Peter and Hazan, Elad and Kale, Satyen},
  volume = 	 {40},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Paris, France},
  month = 	 {03--06 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v40/Perchet15.pdf},
  url = 	 {https://proceedings.mlr.press/v40/Perchet15.html},
  abstract = 	 {Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.}
}

Endnote

%0 Conference Paper
%T Batched Bandit Problems
%A Vianney Perchet
%A Philippe Rigollet
%A Sylvain Chassang
%A Erik Snowberg
%B Proceedings of The 28th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2015
%E Peter Grünwald
%E Elad Hazan
%E Satyen Kale	
%F pmlr-v40-Perchet15
%I PMLR
%P 1456--1456
%U https://proceedings.mlr.press/v40/Perchet15.html
%V 40
%X Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

RIS


TY  - CPAPER
TI  - Batched Bandit Problems
AU  - Vianney Perchet
AU  - Philippe Rigollet
AU  - Sylvain Chassang
AU  - Erik Snowberg
BT  - Proceedings of The 28th Conference on Learning Theory
DA  - 2015/06/26
ED  - Peter Grünwald
ED  - Elad Hazan
ED  - Satyen Kale	
ID  - pmlr-v40-Perchet15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 40
SP  - 1456
EP  - 1456
L1  - http://proceedings.mlr.press/v40/Perchet15.pdf
UR  - https://proceedings.mlr.press/v40/Perchet15.html
AB  - Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
ER  -

APA


Perchet, V., Rigollet, P., Chassang, S. & Snowberg, E.. (2015). Batched Bandit Problems. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:1456-1456 Available from https://proceedings.mlr.press/v40/Perchet15.html.

Related Material

Download PDF