On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

Roman Pogodin; Tor Lattimore

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

Roman Pogodin, Tor Lattimore

Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:894-904, 2020.

Abstract

We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).

Cite this Paper

BibTeX


@InProceedings{pmlr-v115-pogodin20a,
  title = 	 {On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits},
  author =       {Pogodin, Roman and Lattimore, Tor},
  booktitle = 	 {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference},
  pages = 	 {894--904},
  year = 	 {2020},
  editor = 	 {Adams, Ryan P. and Gogate, Vibhav},
  volume = 	 {115},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {22--25 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v115/pogodin20a/pogodin20a.pdf},
  url = 	 {https://proceedings.mlr.press/v115/pogodin20a.html},
  abstract = 	 {We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).}
}

Endnote

%0 Conference Paper
%T On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits
%A Roman Pogodin
%A Tor Lattimore
%B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference
%C Proceedings of Machine Learning Research
%D 2020
%E Ryan P. Adams
%E Vibhav Gogate	
%F pmlr-v115-pogodin20a
%I PMLR
%P 894--904
%U https://proceedings.mlr.press/v115/pogodin20a.html
%V 115
%X We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).

APA


Pogodin, R. & Lattimore, T.. (2020). On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:894-904 Available from https://proceedings.mlr.press/v115/pogodin20a.html.

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

Abstract

Cite this Paper

Related Material