On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

Roman Pogodin, Tor Lattimore
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:894-904, 2020.

Abstract

We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-pogodin20a, title = {On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits}, author = {Pogodin, Roman and Lattimore, Tor}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {894--904}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/pogodin20a/pogodin20a.pdf}, url = {https://proceedings.mlr.press/v115/pogodin20a.html}, abstract = {We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).} }
Endnote
%0 Conference Paper %T On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits %A Roman Pogodin %A Tor Lattimore %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-pogodin20a %I PMLR %P 894--904 %U https://proceedings.mlr.press/v115/pogodin20a.html %V 115 %X We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).
APA
Pogodin, R. & Lattimore, T.. (2020). On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:894-904 Available from https://proceedings.mlr.press/v115/pogodin20a.html.

Related Material