HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search

Tuan Nguyen, Jay Barrett, Kwang-Sung Jun
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:460-468, 2025.

Abstract

We study the problem of estimating the \emph{value} of the largest mean among $K$ distributions via samples from them (rather than estimating \emph{which} distribution has the largest mean), which arises from various machine learning tasks including Q-learning and Monte Carlo Tree Search (MCTS). While there have been a few proposed algorithms, their performance analyses have been limited to their biases rather than a precise error metric. In this paper, we propose a novel algorithm called HAVER (Head AVERaging) and analyze its mean squared error. Our analysis reveals that HAVER has a compelling performance in two respects. First, HAVER estimates the maximum mean as well as the oracle who knows the identity of the best distribution and reports its sample mean. Second, perhaps surprisingly, HAVER exhibits even better rates than this oracle when there are many distributions near the best one. Both of these improvements are the first of their kind in the literature, and we also prove that the naive algorithm that reports the largest empirical mean does not achieve these bounds. Finally, we confirm our theoretical findings via numerical experiments where we implement HAVER in bandit, Q-learning, and MCTS algorithms. In these experiments, HAVER consistently outperforms the baseline methods, demonstrating its effectiveness across different applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-nguyen25b, title = {HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search}, author = {Nguyen, Tuan and Barrett, Jay and Jun, Kwang-Sung}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {460--468}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/nguyen25b/nguyen25b.pdf}, url = {https://proceedings.mlr.press/v258/nguyen25b.html}, abstract = {We study the problem of estimating the \emph{value} of the largest mean among $K$ distributions via samples from them (rather than estimating \emph{which} distribution has the largest mean), which arises from various machine learning tasks including Q-learning and Monte Carlo Tree Search (MCTS). While there have been a few proposed algorithms, their performance analyses have been limited to their biases rather than a precise error metric. In this paper, we propose a novel algorithm called HAVER (Head AVERaging) and analyze its mean squared error. Our analysis reveals that HAVER has a compelling performance in two respects. First, HAVER estimates the maximum mean as well as the oracle who knows the identity of the best distribution and reports its sample mean. Second, perhaps surprisingly, HAVER exhibits even better rates than this oracle when there are many distributions near the best one. Both of these improvements are the first of their kind in the literature, and we also prove that the naive algorithm that reports the largest empirical mean does not achieve these bounds. Finally, we confirm our theoretical findings via numerical experiments where we implement HAVER in bandit, Q-learning, and MCTS algorithms. In these experiments, HAVER consistently outperforms the baseline methods, demonstrating its effectiveness across different applications.} }
Endnote
%0 Conference Paper %T HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search %A Tuan Nguyen %A Jay Barrett %A Kwang-Sung Jun %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-nguyen25b %I PMLR %P 460--468 %U https://proceedings.mlr.press/v258/nguyen25b.html %V 258 %X We study the problem of estimating the \emph{value} of the largest mean among $K$ distributions via samples from them (rather than estimating \emph{which} distribution has the largest mean), which arises from various machine learning tasks including Q-learning and Monte Carlo Tree Search (MCTS). While there have been a few proposed algorithms, their performance analyses have been limited to their biases rather than a precise error metric. In this paper, we propose a novel algorithm called HAVER (Head AVERaging) and analyze its mean squared error. Our analysis reveals that HAVER has a compelling performance in two respects. First, HAVER estimates the maximum mean as well as the oracle who knows the identity of the best distribution and reports its sample mean. Second, perhaps surprisingly, HAVER exhibits even better rates than this oracle when there are many distributions near the best one. Both of these improvements are the first of their kind in the literature, and we also prove that the naive algorithm that reports the largest empirical mean does not achieve these bounds. Finally, we confirm our theoretical findings via numerical experiments where we implement HAVER in bandit, Q-learning, and MCTS algorithms. In these experiments, HAVER consistently outperforms the baseline methods, demonstrating its effectiveness across different applications.
APA
Nguyen, T., Barrett, J. & Jun, K.. (2025). HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:460-468 Available from https://proceedings.mlr.press/v258/nguyen25b.html.

Related Material