[edit]
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
29th Annual Conference on Learning Theory, PMLR 49:1214-1245, 2016.
Abstract
I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.