Mirror Descent and the Information Ratio

Tor Lattimore, Andras Gyorgy
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:2965-2992, 2021.

Abstract

We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy (2014). Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directed sampling and provide an efficient algorithm for adversarial bandits for which the regret upper bound matches exactly the best known information-theoretic upper bound. Keywords: Bandits, partial monitoring, mirror descent, information theory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-lattimore21b, title = {Mirror Descent and the Information Ratio}, author = {Lattimore, Tor and Gyorgy, Andras}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {2965--2992}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/lattimore21b/lattimore21b.pdf}, url = {https://proceedings.mlr.press/v134/lattimore21b.html}, abstract = {We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy (2014). Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directed sampling and provide an efficient algorithm for adversarial bandits for which the regret upper bound matches exactly the best known information-theoretic upper bound. Keywords: Bandits, partial monitoring, mirror descent, information theory.} }
Endnote
%0 Conference Paper %T Mirror Descent and the Information Ratio %A Tor Lattimore %A Andras Gyorgy %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-lattimore21b %I PMLR %P 2965--2992 %U https://proceedings.mlr.press/v134/lattimore21b.html %V 134 %X We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy (2014). Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directed sampling and provide an efficient algorithm for adversarial bandits for which the regret upper bound matches exactly the best known information-theoretic upper bound. Keywords: Bandits, partial monitoring, mirror descent, information theory.
APA
Lattimore, T. & Gyorgy, A.. (2021). Mirror Descent and the Information Ratio. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:2965-2992 Available from https://proceedings.mlr.press/v134/lattimore21b.html.

Related Material