[edit]
Batched Bandit Problems
Proceedings of The 28th Conference on Learning Theory, PMLR 40:1456-1456, 2015.
Abstract
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.