A simple multi-armed bandit algorithm with optimal variation-bounded regret

Elad Hazan; Satyen Kale

A simple multi-armed bandit algorithm with optimal variation-bounded regret

Elad Hazan, Satyen Kale

Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:817-820, 2011.

Abstract

We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v19-hazan11b,
  title = 	 {A simple multi-armed bandit algorithm with optimal variation-bounded regret},
  author = 	 {Hazan, Elad and Kale, Satyen},
  booktitle = 	 {Proceedings of the 24th Annual Conference on Learning Theory},
  pages = 	 {817--820},
  year = 	 {2011},
  editor = 	 {Kakade, Sham M. and von Luxburg, Ulrike},
  volume = 	 {19},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Budapest, Hungary},
  month = 	 {09--11 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v19/hazan11b/hazan11b.pdf},
  url = 	 {https://proceedings.mlr.press/v19/hazan11b.html},
  abstract = 	 {We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.}
}

Endnote

%0 Conference Paper
%T A simple multi-armed bandit algorithm with optimal variation-bounded regret
%A Elad Hazan
%A Satyen Kale
%B Proceedings of the 24th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2011
%E Sham M. Kakade
%E Ulrike von Luxburg	
%F pmlr-v19-hazan11b
%I PMLR
%P 817--820
%U https://proceedings.mlr.press/v19/hazan11b.html
%V 19
%X We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.

RIS


TY  - CPAPER
TI  - A simple multi-armed bandit algorithm with optimal variation-bounded regret
AU  - Elad Hazan
AU  - Satyen Kale
BT  - Proceedings of the 24th Annual Conference on Learning Theory
DA  - 2011/12/21
ED  - Sham M. Kakade
ED  - Ulrike von Luxburg	
ID  - pmlr-v19-hazan11b
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 19
SP  - 817
EP  - 820
L1  - http://proceedings.mlr.press/v19/hazan11b/hazan11b.pdf
UR  - https://proceedings.mlr.press/v19/hazan11b.html
AB  - We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.
ER  -

APA


Hazan, E. & Kale, S.. (2011). A simple multi-armed bandit algorithm with optimal variation-bounded regret. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:817-820 Available from https://proceedings.mlr.press/v19/hazan11b.html.

Related Material

Download PDF