How Hard is Robust Mean Estimation?
[edit]
Proceedings of the ThirtySecond Conference on Learning Theory, PMLR 99:16491682, 2019.
Abstract
Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$dimensional distribution $D$ from a list of independent samples, an $\varepsilon$fraction of which have been arbitrarily corrupted by a malicious adversary. Recent algorithmic progress has resulted in the first polynomialtime algorithms which achieve \emph{dimensionindependent} rates of error: for instance, if $D$ has covariance $I$, in polynomialtime one may find $\hat{\mu}$ with $\\mu  \hat{\mu}\ \leq O(\sqrt{\varepsilon})$. However, error rates achieved by current polynomialtime algorithms, while dimensionindependent, are suboptimal in many natural settings, such as when $D$ is subGaussian, or has bounded $4$th moments. In this work we give worstcase complexitytheoretic evidence that improving on the error rates of current polynomialtime algorithms for robust mean estimation may be computationally intractable in natural settings. We show that several natural approaches to improving error rates of current polynomialtime robust mean estimation algorithms would imply efficient algorithms for the smallset expansion problem, refuting Raghavendra and Steurer’s smallset expansion hypothesis (so long as $P \neq NP$). We also give the first direct reduction to the robust mean estimation problem, starting from a plausible but nonstandard variant of the smallset expansion problem.
Related Material


