Local Glivenko-Cantelli

Doron Cohen, Aryeh Kontorovich
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:715-715, 2023.

Abstract

If $\mu$ is a distribution over the $d$-dimensional Boolean cube $\set{0,1}^d$, our goal is to estimate its mean $p\in[0,1]^d$ based on $n$ iid draws from $\mu$. Specifically, we consider the empirical mean estimator $\pn$ and study the expected maximal deviation $\Delta_n=\E\max_{j\in[d]}|\pn(j)-p(j)|$. In the classical Universal Glivenko-Cantelli setting, one seeks distribution-free (i.e., independent of $\mu$) bounds on $\Delta_n$. This regime is well-understood: for all $\mu$, we have $\Delta_n\lesssim\sqrt{\log(d)/n}$ up to universal constants, and the bound is tight.Our present work seeks to establish dimension-free (i.e., without an explicit dependence on $d$) estimates on $\Delta_n$, including those that hold for $d=\infty$. As such bounds must necessarily depend on $\mu$, we refer to this regime as {\em local} Glivenko-Cantelli (also known as $\mu$-GC), and are aware of very few previous bounds of this type — which are either “abstract” or quite sub-optimal. Already the special case of product measures $\mu$ is rather non-trivial. We give necessary and sufficient conditions on $\mu$ for $\Delta_n\to0$, and calculate sharp rates for this decay. Along the way, we discover a novel sub-gamma-type maximal inequality for shifted Bernoullis, of independent interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-cohen23a, title = {Local Glivenko-Cantelli}, author = {Cohen, Doron and Kontorovich, Aryeh}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {715--715}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/cohen23a/cohen23a.pdf}, url = {https://proceedings.mlr.press/v195/cohen23a.html}, abstract = {If $\mu$ is a distribution over the $d$-dimensional Boolean cube $\set{0,1}^d$, our goal is to estimate its mean $p\in[0,1]^d$ based on $n$ iid draws from $\mu$. Specifically, we consider the empirical mean estimator $\pn$ and study the expected maximal deviation $\Delta_n=\E\max_{j\in[d]}|\pn(j)-p(j)|$. In the classical Universal Glivenko-Cantelli setting, one seeks distribution-free (i.e., independent of $\mu$) bounds on $\Delta_n$. This regime is well-understood: for all $\mu$, we have $\Delta_n\lesssim\sqrt{\log(d)/n}$ up to universal constants, and the bound is tight.Our present work seeks to establish dimension-free (i.e., without an explicit dependence on $d$) estimates on $\Delta_n$, including those that hold for $d=\infty$. As such bounds must necessarily depend on $\mu$, we refer to this regime as {\em local} Glivenko-Cantelli (also known as $\mu$-GC), and are aware of very few previous bounds of this type — which are either “abstract” or quite sub-optimal. Already the special case of product measures $\mu$ is rather non-trivial. We give necessary and sufficient conditions on $\mu$ for $\Delta_n\to0$, and calculate sharp rates for this decay. Along the way, we discover a novel sub-gamma-type maximal inequality for shifted Bernoullis, of independent interest.} }
Endnote
%0 Conference Paper %T Local Glivenko-Cantelli %A Doron Cohen %A Aryeh Kontorovich %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-cohen23a %I PMLR %P 715--715 %U https://proceedings.mlr.press/v195/cohen23a.html %V 195 %X If $\mu$ is a distribution over the $d$-dimensional Boolean cube $\set{0,1}^d$, our goal is to estimate its mean $p\in[0,1]^d$ based on $n$ iid draws from $\mu$. Specifically, we consider the empirical mean estimator $\pn$ and study the expected maximal deviation $\Delta_n=\E\max_{j\in[d]}|\pn(j)-p(j)|$. In the classical Universal Glivenko-Cantelli setting, one seeks distribution-free (i.e., independent of $\mu$) bounds on $\Delta_n$. This regime is well-understood: for all $\mu$, we have $\Delta_n\lesssim\sqrt{\log(d)/n}$ up to universal constants, and the bound is tight.Our present work seeks to establish dimension-free (i.e., without an explicit dependence on $d$) estimates on $\Delta_n$, including those that hold for $d=\infty$. As such bounds must necessarily depend on $\mu$, we refer to this regime as {\em local} Glivenko-Cantelli (also known as $\mu$-GC), and are aware of very few previous bounds of this type — which are either “abstract” or quite sub-optimal. Already the special case of product measures $\mu$ is rather non-trivial. We give necessary and sufficient conditions on $\mu$ for $\Delta_n\to0$, and calculate sharp rates for this decay. Along the way, we discover a novel sub-gamma-type maximal inequality for shifted Bernoullis, of independent interest.
APA
Cohen, D. & Kontorovich, A.. (2023). Local Glivenko-Cantelli. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:715-715 Available from https://proceedings.mlr.press/v195/cohen23a.html.

Related Material