An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Yevgeny Seldin; Gábor Lugosi

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Yevgeny Seldin, Gábor Lugosi

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1743-1759, 2017.

Abstract

We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from

$(\ln t)^3$ to

$(\ln t)^2$ and eliminates an additive factor of order

$∆e^1/∆^2$ , where

$∆$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

Cite this Paper

BibTeX


@InProceedings{pmlr-v65-seldin17a,
  title = 	 {An Improved Parametrization and Analysis of the {EXP3++} Algorithm for Stochastic and Adversarial Bandits},
  author = 	 {Seldin, Yevgeny and Lugosi, Gábor},
  booktitle = 	 {Proceedings of the 2017 Conference on Learning Theory},
  pages = 	 {1743--1759},
  year = 	 {2017},
  editor = 	 {Kale, Satyen and Shamir, Ohad},
  volume = 	 {65},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v65/seldin17a/seldin17a.pdf},
  url = 	 {https://proceedings.mlr.press/v65/seldin17a.html},
  abstract = 	 {We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(\ln t)^3$ to $(\ln t)^2$ and eliminates an additive factor of order $∆e^1/∆^2$, where $∆$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.}
}

Endnote

%0 Conference Paper
%T An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits
%A Yevgeny Seldin
%A Gábor Lugosi
%B Proceedings of the 2017 Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Satyen Kale
%E Ohad Shamir	
%F pmlr-v65-seldin17a
%I PMLR
%P 1743--1759
%U https://proceedings.mlr.press/v65/seldin17a.html
%V 65
%X We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(\ln t)^3$ to $(\ln t)^2$ and eliminates an additive factor of order $∆e^1/∆^2$, where $∆$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

APA


Seldin, Y. & Lugosi, G.. (2017). An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1743-1759 Available from https://proceedings.mlr.press/v65/seldin17a.html.

Related Material

Download PDF