[edit]
Beyond Catoni: Sharper Rates for Heavy-Tailed and Robust Mean Estimation
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:2232-2269, 2024.
Abstract
We study the fundamental problem of estimating the mean of a d-dimensional distribution with covariance Σ≼ given n samples. When d = 1, \cite{catoni} showed an estimator with error (1+o(1)) \cdot \sigma \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}, with probability 1 - \delta, matching the Gaussian error rate. For d>1, a natural estimator outputs the center of the minimum enclosing ball of one-dimensional confidence intervals to achieve a 1-\delta confidence radius of \sqrt{\frac{2 d}{d+1}} \cdot \sigma \left(\sqrt{\frac{d}{n}} + \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}\right), incurring a \sqrt{\frac{2d}{d+1}}-factor loss over the Gaussian rate. When the \sqrt{\frac{d}{n}} term dominates by a \sqrt{\log \frac{1}{\delta}} factor, \cite{lee2022optimal-highdim} showed an improved estimator matching the Gaussian rate. This raises a natural question: Is the \sqrt{\frac{2 d}{d+1}} loss \emph{necessary} when the \sqrt{\frac{2 \log \frac{1}{\delta}}{n}} term dominates? We show that the answer is \emph{no} – we construct an estimator that improves over the above naive estimator by a constant factor. We also consider robust estimation, where an adversary is allowed to corrupt an \epsilon-fraction of samples arbitrarily: in this case, we show that the above strategy of combining one-dimensional estimates and incurring the \sqrt{\frac{2d}{d+1}}-factor \emph{is} optimal in the infinite-sample limit.