Data-Driven Confidence Intervals with Optimal Rates for the Mean of Heavy-Tailed Distributions

Ambrus Tamás, Szabolcs Szentpéteri, Balázs Csáji
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3439-3447, 2024.

Abstract

Estimating the expected value is one of the key problems of statistics, and it serves as a backbone for countless methods in machine learning. In this paper we propose a new algorithm to build non-asymptotically exact confidence intervals for the mean of a symmetric distribution based on an independent, identically distributed sample. The method combines resampling with median-of-means estimates to ensure optimal subgaussian bounds for the sizes of the confidence intervals under mild, heavy-tailed moment conditions. The scheme is completely data-driven: the construction does not need any information about the moments, yet it manages to build exact confidence regions which shrink at the optimal rate. We also show how to generalize the approach to higher dimensions and prove dimension-free, subgaussian PAC bounds for the exclusion probabilities of false candidates. Finally, we illustrate the method and its properties for heavy-tailed distributions with numerical experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-tamas24a, title = {Data-Driven Confidence Intervals with Optimal Rates for the Mean of Heavy-Tailed Distributions}, author = {Tam\'{a}s, Ambrus and Szentp\'{e}teri, Szabolcs and Cs\'{a}ji, Bal\'{a}zs}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3439--3447}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/tamas24a/tamas24a.pdf}, url = {https://proceedings.mlr.press/v238/tamas24a.html}, abstract = {Estimating the expected value is one of the key problems of statistics, and it serves as a backbone for countless methods in machine learning. In this paper we propose a new algorithm to build non-asymptotically exact confidence intervals for the mean of a symmetric distribution based on an independent, identically distributed sample. The method combines resampling with median-of-means estimates to ensure optimal subgaussian bounds for the sizes of the confidence intervals under mild, heavy-tailed moment conditions. The scheme is completely data-driven: the construction does not need any information about the moments, yet it manages to build exact confidence regions which shrink at the optimal rate. We also show how to generalize the approach to higher dimensions and prove dimension-free, subgaussian PAC bounds for the exclusion probabilities of false candidates. Finally, we illustrate the method and its properties for heavy-tailed distributions with numerical experiments.} }
Endnote
%0 Conference Paper %T Data-Driven Confidence Intervals with Optimal Rates for the Mean of Heavy-Tailed Distributions %A Ambrus Tamás %A Szabolcs Szentpéteri %A Balázs Csáji %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-tamas24a %I PMLR %P 3439--3447 %U https://proceedings.mlr.press/v238/tamas24a.html %V 238 %X Estimating the expected value is one of the key problems of statistics, and it serves as a backbone for countless methods in machine learning. In this paper we propose a new algorithm to build non-asymptotically exact confidence intervals for the mean of a symmetric distribution based on an independent, identically distributed sample. The method combines resampling with median-of-means estimates to ensure optimal subgaussian bounds for the sizes of the confidence intervals under mild, heavy-tailed moment conditions. The scheme is completely data-driven: the construction does not need any information about the moments, yet it manages to build exact confidence regions which shrink at the optimal rate. We also show how to generalize the approach to higher dimensions and prove dimension-free, subgaussian PAC bounds for the exclusion probabilities of false candidates. Finally, we illustrate the method and its properties for heavy-tailed distributions with numerical experiments.
APA
Tamás, A., Szentpéteri, S. & Csáji, B.. (2024). Data-Driven Confidence Intervals with Optimal Rates for the Mean of Heavy-Tailed Distributions. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3439-3447 Available from https://proceedings.mlr.press/v238/tamas24a.html.

Related Material