Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

Tor Lattimore
; 29th Annual Conference on Learning Theory, PMLR 49:1214-1245, 2016.

Abstract

I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

Cite this Paper


BibTeX
@InProceedings{pmlr-v49-lattimore16, title = {Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits}, author = {Tor Lattimore}, booktitle = {29th Annual Conference on Learning Theory}, pages = {1214--1245}, year = {2016}, editor = {Vitaly Feldman and Alexander Rakhlin and Ohad Shamir}, volume = {49}, series = {Proceedings of Machine Learning Research}, address = {Columbia University, New York, New York, USA}, month = {23--26 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v49/lattimore16.pdf}, url = {http://proceedings.mlr.press/v49/lattimore16.html}, abstract = {I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling. } }
Endnote
%0 Conference Paper %T Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits %A Tor Lattimore %B 29th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2016 %E Vitaly Feldman %E Alexander Rakhlin %E Ohad Shamir %F pmlr-v49-lattimore16 %I PMLR %J Proceedings of Machine Learning Research %P 1214--1245 %U http://proceedings.mlr.press %V 49 %W PMLR %X I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.
RIS
TY - CPAPER TI - Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits AU - Tor Lattimore BT - 29th Annual Conference on Learning Theory PY - 2016/06/06 DA - 2016/06/06 ED - Vitaly Feldman ED - Alexander Rakhlin ED - Ohad Shamir ID - pmlr-v49-lattimore16 PB - PMLR SP - 1214 DP - PMLR EP - 1245 L1 - http://proceedings.mlr.press/v49/lattimore16.pdf UR - http://proceedings.mlr.press/v49/lattimore16.html AB - I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling. ER -
APA
Lattimore, T.. (2016). Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits. 29th Annual Conference on Learning Theory, in PMLR 49:1214-1245

Related Material