When and why randomised exploration works (in linear bandits)

Marc Abeille, David Janz, Ciara Pike-Burke
Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:4-22, 2025.

Abstract

We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the d-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an n-step regret bound of the order O(dnlog(n)). Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.

Cite this Paper


BibTeX
@InProceedings{pmlr-v272-abeille25a, title = {When and why randomised exploration works (in linear bandits)}, author = {Abeille, Marc and Janz, David and Pike-Burke, Ciara}, booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory}, pages = {4--22}, year = {2025}, editor = {Kamath, Gautam and Loh, Po-Ling}, volume = {272}, series = {Proceedings of Machine Learning Research}, month = {24--27 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v272/main/assets/abeille25a/abeille25a.pdf}, url = {https://proceedings.mlr.press/v272/abeille25a.html}, abstract = {We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$. Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.} }
Endnote
%0 Conference Paper %T When and why randomised exploration works (in linear bandits) %A Marc Abeille %A David Janz %A Ciara Pike-Burke %B Proceedings of The 36th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Gautam Kamath %E Po-Ling Loh %F pmlr-v272-abeille25a %I PMLR %P 4--22 %U https://proceedings.mlr.press/v272/abeille25a.html %V 272 %X We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$. Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.
APA
Abeille, M., Janz, D. & Pike-Burke, C.. (2025). When and why randomised exploration works (in linear bandits). Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:4-22 Available from https://proceedings.mlr.press/v272/abeille25a.html.

Related Material