A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences


Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz ;
Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:497-514, 2011.


We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports(not necessarily known beforehand), whose asymptotic regret matches the lower bound of \citetBurnetas96. Our contribution is to provide a finite-time analysis of this algorithm;we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

Related Material