[edit]
Local Glivenko-Cantelli
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:715-715, 2023.
Abstract
If μ is a distribution over the d-dimensional Boolean cube \set0,1d, our goal is to estimate its mean p∈[0,1]d based on n iid draws from μ. Specifically, we consider the empirical mean estimator \pn and study the expected maximal deviation Δn=\Emax. In the classical Universal Glivenko-Cantelli setting, one seeks distribution-free (i.e., independent of \mu) bounds on \Delta_n. This regime is well-understood: for all \mu, we have \Delta_n\lesssim\sqrt{\log(d)/n} up to universal constants, and the bound is tight.Our present work seeks to establish dimension-free (i.e., without an explicit dependence on d) estimates on \Delta_n, including those that hold for d=\infty. As such bounds must necessarily depend on \mu, we refer to this regime as {\em local} Glivenko-Cantelli (also known as \mu-GC), and are aware of very few previous bounds of this type — which are either “abstract” or quite sub-optimal. Already the special case of product measures \mu is rather non-trivial. We give necessary and sufficient conditions on \mu for \Delta_n\to0, and calculate sharp rates for this decay. Along the way, we discover a novel sub-gamma-type maximal inequality for shifted Bernoullis, of independent interest.