How Hard is Robust Mean Estimation?

Samuel B. Hopkins, Jerry Li
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:1649-1682, 2019.

Abstract

Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$-dimensional distribution $D$ from a list of independent samples, an $\varepsilon$-fraction of which have been arbitrarily corrupted by a malicious adversary. Recent algorithmic progress has resulted in the first polynomial-time algorithms which achieve \emph{dimension-independent} rates of error: for instance, if $D$ has covariance $I$, in polynomial-time one may find $\hat{\mu}$ with $\|\mu - \hat{\mu}\| \leq O(\sqrt{\varepsilon})$. However, error rates achieved by current polynomial-time algorithms, while dimension-independent, are sub-optimal in many natural settings, such as when $D$ is sub-Gaussian, or has bounded $4$-th moments. In this work we give worst-case complexity-theoretic evidence that improving on the error rates of current polynomial-time algorithms for robust mean estimation may be computationally intractable in natural settings. We show that several natural approaches to improving error rates of current polynomial-time robust mean estimation algorithms would imply efficient algorithms for the small-set expansion problem, refuting Raghavendra and Steurer’s small-set expansion hypothesis (so long as $P \neq NP$). We also give the first direct reduction to the robust mean estimation problem, starting from a plausible but nonstandard variant of the small-set expansion problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-hopkins19a, title = {How Hard is Robust Mean Estimation?}, author = {Hopkins, Samuel B. and Li, Jerry}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {1649--1682}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/hopkins19a/hopkins19a.pdf}, url = {https://proceedings.mlr.press/v99/hopkins19a.html}, abstract = {Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$-dimensional distribution $D$ from a list of independent samples, an $\varepsilon$-fraction of which have been arbitrarily corrupted by a malicious adversary. Recent algorithmic progress has resulted in the first polynomial-time algorithms which achieve \emph{dimension-independent} rates of error: for instance, if $D$ has covariance $I$, in polynomial-time one may find $\hat{\mu}$ with $\|\mu - \hat{\mu}\| \leq O(\sqrt{\varepsilon})$. However, error rates achieved by current polynomial-time algorithms, while dimension-independent, are sub-optimal in many natural settings, such as when $D$ is sub-Gaussian, or has bounded $4$-th moments. In this work we give worst-case complexity-theoretic evidence that improving on the error rates of current polynomial-time algorithms for robust mean estimation may be computationally intractable in natural settings. We show that several natural approaches to improving error rates of current polynomial-time robust mean estimation algorithms would imply efficient algorithms for the small-set expansion problem, refuting Raghavendra and Steurer’s small-set expansion hypothesis (so long as $P \neq NP$). We also give the first direct reduction to the robust mean estimation problem, starting from a plausible but nonstandard variant of the small-set expansion problem.} }
Endnote
%0 Conference Paper %T How Hard is Robust Mean Estimation? %A Samuel B. Hopkins %A Jerry Li %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-hopkins19a %I PMLR %P 1649--1682 %U https://proceedings.mlr.press/v99/hopkins19a.html %V 99 %X Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$-dimensional distribution $D$ from a list of independent samples, an $\varepsilon$-fraction of which have been arbitrarily corrupted by a malicious adversary. Recent algorithmic progress has resulted in the first polynomial-time algorithms which achieve \emph{dimension-independent} rates of error: for instance, if $D$ has covariance $I$, in polynomial-time one may find $\hat{\mu}$ with $\|\mu - \hat{\mu}\| \leq O(\sqrt{\varepsilon})$. However, error rates achieved by current polynomial-time algorithms, while dimension-independent, are sub-optimal in many natural settings, such as when $D$ is sub-Gaussian, or has bounded $4$-th moments. In this work we give worst-case complexity-theoretic evidence that improving on the error rates of current polynomial-time algorithms for robust mean estimation may be computationally intractable in natural settings. We show that several natural approaches to improving error rates of current polynomial-time robust mean estimation algorithms would imply efficient algorithms for the small-set expansion problem, refuting Raghavendra and Steurer’s small-set expansion hypothesis (so long as $P \neq NP$). We also give the first direct reduction to the robust mean estimation problem, starting from a plausible but nonstandard variant of the small-set expansion problem.
APA
Hopkins, S.B. & Li, J.. (2019). How Hard is Robust Mean Estimation?. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:1649-1682 Available from https://proceedings.mlr.press/v99/hopkins19a.html.

Related Material