How Hard is Robust Mean Estimation?

Samuel B. Hopkins; Jerry Li

How Hard is Robust Mean Estimation?

Samuel B. Hopkins, Jerry Li

Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:1649-1682, 2019.

Abstract

Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$-dimensional distribution $D$ from a list of independent samples, an $\varepsilon$-fraction of which have been arbitrarily corrupted by a malicious adversary. Recent algorithmic progress has resulted in the first polynomial-time algorithms which achieve \emph{dimension-independent} rates of error: for instance, if $D$ has covariance $I$, in polynomial-time one may find $\hat{\mu}$ with $\|\mu - \hat{\mu}\| \leq O(\sqrt{\varepsilon})$. However, error rates achieved by current polynomial-time algorithms, while dimension-independent, are sub-optimal in many natural settings, such as when $D$ is sub-Gaussian, or has bounded $4$-th moments. In this work we give worst-case complexity-theoretic evidence that improving on the error rates of current polynomial-time algorithms for robust mean estimation may be computationally intractable in natural settings. We show that several natural approaches to improving error rates of current polynomial-time robust mean estimation algorithms would imply efficient algorithms for the small-set expansion problem, refuting Raghavendra and Steurer’s small-set expansion hypothesis (so long as $P \neq NP$). We also give the first direct reduction to the robust mean estimation problem, starting from a plausible but nonstandard variant of the small-set expansion problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v99-hopkins19a,
  title = 	 {How Hard is Robust Mean Estimation?},
  author =       {Hopkins, Samuel B. and Li, Jerry},
  booktitle = 	 {Proceedings of the Thirty-Second Conference on Learning Theory},
  pages = 	 {1649--1682},
  year = 	 {2019},
  editor = 	 {Beygelzimer, Alina and Hsu, Daniel},
  volume = 	 {99},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v99/hopkins19a/hopkins19a.pdf},
  url = 	 {https://proceedings.mlr.press/v99/hopkins19a.html},
  abstract = 	 {Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$-dimensional distribution $D$ from a list of independent samples, an $\varepsilon$-fraction of which have been arbitrarily corrupted by a malicious adversary. Recent algorithmic progress has resulted in the first polynomial-time algorithms which achieve \emph{dimension-independent} rates of error: for instance, if $D$ has covariance $I$, in polynomial-time one may find $\hat{\mu}$ with $\|\mu - \hat{\mu}\| \leq O(\sqrt{\varepsilon})$. However, error rates achieved by current polynomial-time algorithms, while dimension-independent, are sub-optimal in many natural settings, such as when $D$ is sub-Gaussian, or has bounded $4$-th moments. In this work we give worst-case complexity-theoretic evidence that improving on the error rates of current polynomial-time algorithms for robust mean estimation may be computationally intractable in natural settings. We show that several natural approaches to improving error rates of current polynomial-time robust mean estimation algorithms would imply efficient algorithms for the small-set expansion problem, refuting Raghavendra and Steurer’s small-set expansion hypothesis (so long as $P \neq NP$). We also give the first direct reduction to the robust mean estimation problem, starting from a plausible but nonstandard variant of the small-set expansion problem.}
}

Endnote

%0 Conference Paper
%T How Hard is Robust Mean Estimation?
%A Samuel B. Hopkins
%A Jerry Li
%B Proceedings of the Thirty-Second Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Alina Beygelzimer
%E Daniel Hsu	
%F pmlr-v99-hopkins19a
%I PMLR
%P 1649--1682
%U https://proceedings.mlr.press/v99/hopkins19a.html
%V 99
%X Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$-dimensional distribution $D$ from a list of independent samples, an $\varepsilon$-fraction of which have been arbitrarily corrupted by a malicious adversary. Recent algorithmic progress has resulted in the first polynomial-time algorithms which achieve \emph{dimension-independent} rates of error: for instance, if $D$ has covariance $I$, in polynomial-time one may find $\hat{\mu}$ with $\|\mu - \hat{\mu}\| \leq O(\sqrt{\varepsilon})$. However, error rates achieved by current polynomial-time algorithms, while dimension-independent, are sub-optimal in many natural settings, such as when $D$ is sub-Gaussian, or has bounded $4$-th moments. In this work we give worst-case complexity-theoretic evidence that improving on the error rates of current polynomial-time algorithms for robust mean estimation may be computationally intractable in natural settings. We show that several natural approaches to improving error rates of current polynomial-time robust mean estimation algorithms would imply efficient algorithms for the small-set expansion problem, refuting Raghavendra and Steurer’s small-set expansion hypothesis (so long as $P \neq NP$). We also give the first direct reduction to the robust mean estimation problem, starting from a plausible but nonstandard variant of the small-set expansion problem.

APA


Hopkins, S.B. & Li, J.. (2019). How Hard is Robust Mean Estimation?. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:1649-1682 Available from https://proceedings.mlr.press/v99/hopkins19a.html.

Related Material

Download PDF