A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits

Antoine Barrier; Aurélien Garivier; Tomáš Kocák

A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits

Antoine Barrier, Aurélien Garivier, Tomáš Kocák

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10078-10109, 2022.

Abstract

We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is biased towards exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-barrier22a,
  title = 	 { A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits },
  author =       {Barrier, Antoine and Garivier, Aur\'elien and Koc\'ak, Tom\'a\v{s}},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {10078--10109},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/barrier22a/barrier22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/barrier22a.html},
  abstract = 	 { We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is biased towards exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest. }
}

Endnote

%0 Conference Paper
%T  A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits 
%A Antoine Barrier
%A Aurélien Garivier
%A Tomáš Kocák
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-barrier22a
%I PMLR
%P 10078--10109
%U https://proceedings.mlr.press/v151/barrier22a.html
%V 151
%X  We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is biased towards exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.

APA


Barrier, A., Garivier, A. & Kocák, T.. (2022).  A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:10078-10109 Available from https://proceedings.mlr.press/v151/barrier22a.html.

Related Material

Download PDF