An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

Tor Lattimore, Csaba Szepesvári
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:2111-2139, 2019.

Abstract

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under finite-action partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial monitoring settings. The highlight is a clean analysis of ‘easy’ and ‘hard’ finite partial monitoring, with new regret bounds that are independent of arbitrarily large game-dependent constants and eliminate the logarithmic dependence on the horizon for easy games that appeared in earlier work. The power of the generalised machinery is further demonstrated by proving that the minimax regret for $k$-armed adversarial bandits is at most $\sqrt{2kn}$, improving on existing results by a factor of 2. Finally, we provide a simple analysis of the cops and robbers game, also improving best known constants.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-lattimore19a, title = {An Information-Theoretic Approach to Minimax Regret in Partial Monitoring}, author = {Lattimore, Tor and Szepesv{\'a}ri, Csaba}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {2111--2139}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/lattimore19a/lattimore19a.pdf}, url = {https://proceedings.mlr.press/v99/lattimore19a.html}, abstract = { We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under finite-action partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial monitoring settings. The highlight is a clean analysis of ‘easy’ and ‘hard’ finite partial monitoring, with new regret bounds that are independent of arbitrarily large game-dependent constants and eliminate the logarithmic dependence on the horizon for easy games that appeared in earlier work. The power of the generalised machinery is further demonstrated by proving that the minimax regret for $k$-armed adversarial bandits is at most $\sqrt{2kn}$, improving on existing results by a factor of 2. Finally, we provide a simple analysis of the cops and robbers game, also improving best known constants. } }
Endnote
%0 Conference Paper %T An Information-Theoretic Approach to Minimax Regret in Partial Monitoring %A Tor Lattimore %A Csaba Szepesvári %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-lattimore19a %I PMLR %P 2111--2139 %U https://proceedings.mlr.press/v99/lattimore19a.html %V 99 %X We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under finite-action partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial monitoring settings. The highlight is a clean analysis of ‘easy’ and ‘hard’ finite partial monitoring, with new regret bounds that are independent of arbitrarily large game-dependent constants and eliminate the logarithmic dependence on the horizon for easy games that appeared in earlier work. The power of the generalised machinery is further demonstrated by proving that the minimax regret for $k$-armed adversarial bandits is at most $\sqrt{2kn}$, improving on existing results by a factor of 2. Finally, we provide a simple analysis of the cops and robbers game, also improving best known constants.
APA
Lattimore, T. & Szepesvári, C.. (2019). An Information-Theoretic Approach to Minimax Regret in Partial Monitoring. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:2111-2139 Available from https://proceedings.mlr.press/v99/lattimore19a.html.

Related Material