Tight Bounds for Local Glivenko-Cantelli

Moïse Blanchard, Vaclav Voracek
Proceedings of The 35th International Conference on Algorithmic Learning Theory, PMLR 237:179-220, 2024.

Abstract

This paper addresses the statistical problem of estimating the infinite-norm deviation from the empirical mean to the distribution mean for high-dimensional distributions on $\{0,1\}^d$, potentially with $d=\infty$. Unlike traditional bounds as in the classical Glivenko-Cantelli theorem, we explore the instance-dependent convergence behavior. For product distributions, we provide the exact non-asymptotic behavior of the expected maximum deviation, revealing various regimes of decay. In particular, these tight bounds demonstrate the necessity of a previously proposed factor for an upper bound, answering a corresponding COLT 2023 open problem (Cohen and Kontorovich, 2022, 2023). We also consider general distributions on $\{0,1\}^d$ and provide the tightest possible bounds for the maximum deviation of the empirical mean given only the mean statistic. Along the way, we prove a localized version of the Dvoretzky–Kiefer–Wolfowitz inequality. Additionally, we present some results for two other cases, one where the deviation is measured in some $q$-norm, and the other where the distribution is supported on a continuous domain $[0,1]^d$, and also provide some high-probability bounds for the maximum deviation in the independent Bernoulli case.

Cite this Paper


BibTeX
@InProceedings{pmlr-v237-blanchard24a, title = {Tight Bounds for Local Glivenko-Cantelli}, author = {Blanchard, Mo\"ise and Voracek, Vaclav}, booktitle = {Proceedings of The 35th International Conference on Algorithmic Learning Theory}, pages = {179--220}, year = {2024}, editor = {Vernade, Claire and Hsu, Daniel}, volume = {237}, series = {Proceedings of Machine Learning Research}, month = {25--28 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v237/blanchard24a/blanchard24a.pdf}, url = {https://proceedings.mlr.press/v237/blanchard24a.html}, abstract = {This paper addresses the statistical problem of estimating the infinite-norm deviation from the empirical mean to the distribution mean for high-dimensional distributions on $\{0,1\}^d$, potentially with $d=\infty$. Unlike traditional bounds as in the classical Glivenko-Cantelli theorem, we explore the instance-dependent convergence behavior. For product distributions, we provide the exact non-asymptotic behavior of the expected maximum deviation, revealing various regimes of decay. In particular, these tight bounds demonstrate the necessity of a previously proposed factor for an upper bound, answering a corresponding COLT 2023 open problem (Cohen and Kontorovich, 2022, 2023). We also consider general distributions on $\{0,1\}^d$ and provide the tightest possible bounds for the maximum deviation of the empirical mean given only the mean statistic. Along the way, we prove a localized version of the Dvoretzky–Kiefer–Wolfowitz inequality. Additionally, we present some results for two other cases, one where the deviation is measured in some $q$-norm, and the other where the distribution is supported on a continuous domain $[0,1]^d$, and also provide some high-probability bounds for the maximum deviation in the independent Bernoulli case.} }
Endnote
%0 Conference Paper %T Tight Bounds for Local Glivenko-Cantelli %A Moïse Blanchard %A Vaclav Voracek %B Proceedings of The 35th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Claire Vernade %E Daniel Hsu %F pmlr-v237-blanchard24a %I PMLR %P 179--220 %U https://proceedings.mlr.press/v237/blanchard24a.html %V 237 %X This paper addresses the statistical problem of estimating the infinite-norm deviation from the empirical mean to the distribution mean for high-dimensional distributions on $\{0,1\}^d$, potentially with $d=\infty$. Unlike traditional bounds as in the classical Glivenko-Cantelli theorem, we explore the instance-dependent convergence behavior. For product distributions, we provide the exact non-asymptotic behavior of the expected maximum deviation, revealing various regimes of decay. In particular, these tight bounds demonstrate the necessity of a previously proposed factor for an upper bound, answering a corresponding COLT 2023 open problem (Cohen and Kontorovich, 2022, 2023). We also consider general distributions on $\{0,1\}^d$ and provide the tightest possible bounds for the maximum deviation of the empirical mean given only the mean statistic. Along the way, we prove a localized version of the Dvoretzky–Kiefer–Wolfowitz inequality. Additionally, we present some results for two other cases, one where the deviation is measured in some $q$-norm, and the other where the distribution is supported on a continuous domain $[0,1]^d$, and also provide some high-probability bounds for the maximum deviation in the independent Bernoulli case.
APA
Blanchard, M. & Voracek, V.. (2024). Tight Bounds for Local Glivenko-Cantelli. Proceedings of The 35th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 237:179-220 Available from https://proceedings.mlr.press/v237/blanchard24a.html.

Related Material