When and why randomised exploration works (in linear bandits)

Marc Abeille; David Janz; Ciara Pike-Burke

When and why randomised exploration works (in linear bandits)

Marc Abeille, David Janz, Ciara Pike-Burke

Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:4-22, 2025.

Abstract

We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$. Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.

Cite this Paper

BibTeX

@InProceedings{pmlr-v272-abeille25a,
  title = 	 {When and why randomised exploration works (in linear bandits)},
  author =       {Abeille, Marc and Janz, David and Pike-Burke, Ciara},
  booktitle = 	 {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  pages = 	 {4--22},
  year = 	 {2025},
  editor = 	 {Kamath, Gautam and Loh, Po-Ling},
  volume = 	 {272},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {24--27 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v272/main/assets/abeille25a/abeille25a.pdf},
  url = 	 {https://proceedings.mlr.press/v272/abeille25a.html},
  abstract = 	 {We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$.  Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.}
}

Endnote

%0 Conference Paper
%T When and why randomised exploration works (in linear bandits)
%A Marc Abeille
%A David Janz
%A Ciara Pike-Burke
%B Proceedings of The 36th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Gautam Kamath
%E Po-Ling Loh	
%F pmlr-v272-abeille25a
%I PMLR
%P 4--22
%U https://proceedings.mlr.press/v272/abeille25a.html
%V 272
%X We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$.  Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.

APA

Abeille, M., Janz, D. & Pike-Burke, C.. (2025). When and why randomised exploration works (in linear bandits). Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:4-22 Available from https://proceedings.mlr.press/v272/abeille25a.html.

When and why randomised exploration works (in linear bandits)

Abstract

Cite this Paper

Related Material