Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

Tor Lattimore

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

Tor Lattimore

29th Annual Conference on Learning Theory, PMLR 49:1214-1245, 2016.

Abstract

I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

Cite this Paper

BibTeX


@InProceedings{pmlr-v49-lattimore16,
  title = 	 {Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits},
  author = 	 {Lattimore, Tor},
  booktitle = 	 {29th Annual Conference on Learning Theory},
  pages = 	 {1214--1245},
  year = 	 {2016},
  editor = 	 {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad},
  volume = 	 {49},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Columbia University, New York, New York, USA},
  month = 	 {23--26 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v49/lattimore16.pdf},
  url = 	 {https://proceedings.mlr.press/v49/lattimore16.html},
  abstract = 	 {I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling. }
}

Endnote

%0 Conference Paper
%T Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
%A Tor Lattimore
%B 29th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2016
%E Vitaly Feldman
%E Alexander Rakhlin
%E Ohad Shamir	
%F pmlr-v49-lattimore16
%I PMLR
%P 1214--1245
%U https://proceedings.mlr.press/v49/lattimore16.html
%V 49
%X I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

RIS


TY  - CPAPER
TI  - Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
AU  - Tor Lattimore
BT  - 29th Annual Conference on Learning Theory
DA  - 2016/06/06
ED  - Vitaly Feldman
ED  - Alexander Rakhlin
ED  - Ohad Shamir	
ID  - pmlr-v49-lattimore16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 49
SP  - 1214
EP  - 1245
L1  - http://proceedings.mlr.press/v49/lattimore16.pdf
UR  - https://proceedings.mlr.press/v49/lattimore16.html
AB  - I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling. 
ER  -

APA


Lattimore, T.. (2016). Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:1214-1245 Available from https://proceedings.mlr.press/v49/lattimore16.html.

Related Material

Download PDF