Information Directed Sampling for Linear Partial Monitoring

Johannes Kirschner, Tor Lattimore, Andreas Krause
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:2328-2369, 2020.

Abstract

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce {\em information directed sampling} (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-kirschner20a, title = {Information Directed Sampling for Linear Partial Monitoring}, author = {Kirschner, Johannes and Lattimore, Tor and Krause, Andreas}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {2328--2369}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/kirschner20a/kirschner20a.pdf}, url = {https://proceedings.mlr.press/v125/kirschner20a.html}, abstract = { Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce {\em information directed sampling} (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.} }
Endnote
%0 Conference Paper %T Information Directed Sampling for Linear Partial Monitoring %A Johannes Kirschner %A Tor Lattimore %A Andreas Krause %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-kirschner20a %I PMLR %P 2328--2369 %U https://proceedings.mlr.press/v125/kirschner20a.html %V 125 %X Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce {\em information directed sampling} (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.
APA
Kirschner, J., Lattimore, T. & Krause, A.. (2020). Information Directed Sampling for Linear Partial Monitoring. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:2328-2369 Available from https://proceedings.mlr.press/v125/kirschner20a.html.

Related Material