The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

Tor Lattimore, Csaba Szepesvari
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:728-737, 2017.

Abstract

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the otimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate. In fact, they can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation, for example, generalised linear bandits and reinforcement learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-lattimore17a, title = {{The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits}}, author = {Lattimore, Tor and Szepesvari, Csaba}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {728--737}, year = {2017}, editor = {Singh, Aarti and Zhu, Jerry}, volume = {54}, series = {Proceedings of Machine Learning Research}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/lattimore17a/lattimore17a.pdf}, url = {https://proceedings.mlr.press/v54/lattimore17a.html}, abstract = {Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the otimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate. In fact, they can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation, for example, generalised linear bandits and reinforcement learning. } }
Endnote
%0 Conference Paper %T The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits %A Tor Lattimore %A Csaba Szepesvari %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-lattimore17a %I PMLR %P 728--737 %U https://proceedings.mlr.press/v54/lattimore17a.html %V 54 %X Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the otimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate. In fact, they can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation, for example, generalised linear bandits and reinforcement learning.
APA
Lattimore, T. & Szepesvari, C.. (2017). The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:728-737 Available from https://proceedings.mlr.press/v54/lattimore17a.html.

Related Material