Local Glivenko-Cantelli

Doron Cohen; Aryeh Kontorovich

Local Glivenko-Cantelli

Doron Cohen, Aryeh Kontorovich

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:715-715, 2023.

Abstract

$\mu$ is a distribution over the

$d$ -dimensional Boolean cube

$\set{0,1}^d$ , our goal is to estimate its mean

$p\in[0,1]^d$ based on

$n$ iid draws from

$\mu$ . Specifically, we consider the empirical mean estimator

$\pn$ and study the expected maximal deviation

$\Delta_n=\E\max_{j\in[d]}|\pn(j)-p(j)|$ . In the classical Universal Glivenko-Cantelli setting, one seeks distribution-free (i.e., independent of

$\mu$ ) bounds on

$\Delta_n$ . This regime is well-understood: for all

$\mu$ , we have

$\Delta_n\lesssim\sqrt{\log(d)/n}$ up to universal constants, and the bound is tight.Our present work seeks to establish dimension-free (i.e., without an explicit dependence on

$d$ ) estimates on

$\Delta_n$ , including those that hold for

$d=\infty$ . As such bounds must necessarily depend on

$\mu$ , we refer to this regime as {\em local} Glivenko-Cantelli (also known as

$\mu$ -GC), and are aware of very few previous bounds of this type — which are either “abstract” or quite sub-optimal. Already the special case of product measures

$\mu$ is rather non-trivial. We give necessary and sufficient conditions on

$\mu$ for

$\Delta_n\to0$ , and calculate sharp rates for this decay. Along the way, we discover a novel sub-gamma-type maximal inequality for shifted Bernoullis, of independent interest.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-cohen23a,
  title = 	 {Local Glivenko-Cantelli},
  author =       {Cohen, Doron and Kontorovich, Aryeh},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {715--715},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/cohen23a/cohen23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/cohen23a.html},
  abstract = 	 {If $\mu$ is a distribution over the $d$-dimensional Boolean cube $\set{0,1}^d$, our goal is to estimate its mean $p\in[0,1]^d$ based on $n$ iid draws from $\mu$. Specifically, we consider the empirical mean estimator $\pn$ and study the expected maximal deviation $\Delta_n=\E\max_{j\in[d]}|\pn(j)-p(j)|$. In the classical Universal Glivenko-Cantelli setting, one seeks distribution-free (i.e., independent of $\mu$) bounds on $\Delta_n$. This regime is well-understood: for all $\mu$, we have $\Delta_n\lesssim\sqrt{\log(d)/n}$ up to universal constants, and the bound is tight.Our present work seeks to establish dimension-free (i.e., without an explicit dependence on $d$) estimates on $\Delta_n$, including those that hold for $d=\infty$. As such bounds must necessarily depend on $\mu$, we refer to this regime as {\em local} Glivenko-Cantelli (also known as $\mu$-GC), and are aware of very few previous bounds of this type — which are either “abstract” or quite sub-optimal. Already the special case of product measures $\mu$ is rather non-trivial. We give necessary and sufficient conditions on $\mu$ for $\Delta_n\to0$, and calculate sharp rates for this decay. Along the way, we discover a novel sub-gamma-type maximal inequality for shifted Bernoullis, of independent interest.}
}

Endnote

%0 Conference Paper
%T Local Glivenko-Cantelli
%A Doron Cohen
%A Aryeh Kontorovich
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-cohen23a
%I PMLR
%P 715--715
%U https://proceedings.mlr.press/v195/cohen23a.html
%V 195
%X If $\mu$ is a distribution over the $d$-dimensional Boolean cube $\set{0,1}^d$, our goal is to estimate its mean $p\in[0,1]^d$ based on $n$ iid draws from $\mu$. Specifically, we consider the empirical mean estimator $\pn$ and study the expected maximal deviation $\Delta_n=\E\max_{j\in[d]}|\pn(j)-p(j)|$. In the classical Universal Glivenko-Cantelli setting, one seeks distribution-free (i.e., independent of $\mu$) bounds on $\Delta_n$. This regime is well-understood: for all $\mu$, we have $\Delta_n\lesssim\sqrt{\log(d)/n}$ up to universal constants, and the bound is tight.Our present work seeks to establish dimension-free (i.e., without an explicit dependence on $d$) estimates on $\Delta_n$, including those that hold for $d=\infty$. As such bounds must necessarily depend on $\mu$, we refer to this regime as {\em local} Glivenko-Cantelli (also known as $\mu$-GC), and are aware of very few previous bounds of this type — which are either “abstract” or quite sub-optimal. Already the special case of product measures $\mu$ is rather non-trivial. We give necessary and sufficient conditions on $\mu$ for $\Delta_n\to0$, and calculate sharp rates for this decay. Along the way, we discover a novel sub-gamma-type maximal inequality for shifted Bernoullis, of independent interest.

APA


Cohen, D. & Kontorovich, A.. (2023). Local Glivenko-Cantelli. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:715-715 Available from https://proceedings.mlr.press/v195/cohen23a.html.

Related Material

Download PDF