Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination

Ilias Diakonikolas, Daniel Kane, Sushrut Karmalkar, Ankit Pensia, Thanasis Pittas
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:10811-10840, 2024.

Abstract

We study Gaussian sparse estimation tasks in Huber’s contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian robust $k$-sparse mean estimation on $\mathbb{R}^d$ with corruption rate $\epsilon>0$, our algorithm has sample complexity $(k^2/\epsilon ^2)\mathrm{polylog}(d/\epsilon)$, runs in sample polynomial time, and approximates the target mean within $\ell_2$-error $O(\epsilon)$. Previous efficient algorithms inherently incur error $\Omega(\epsilon \sqrt{\log(1/\epsilon)})$. At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-diakonikolas24a, title = {Robust Sparse Estimation for {G}aussians with Optimal Error under Huber Contamination}, author = {Diakonikolas, Ilias and Kane, Daniel and Karmalkar, Sushrut and Pensia, Ankit and Pittas, Thanasis}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {10811--10840}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/diakonikolas24a/diakonikolas24a.pdf}, url = {https://proceedings.mlr.press/v235/diakonikolas24a.html}, abstract = {We study Gaussian sparse estimation tasks in Huber’s contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian robust $k$-sparse mean estimation on $\mathbb{R}^d$ with corruption rate $\epsilon>0$, our algorithm has sample complexity $(k^2/\epsilon ^2)\mathrm{polylog}(d/\epsilon)$, runs in sample polynomial time, and approximates the target mean within $\ell_2$-error $O(\epsilon)$. Previous efficient algorithms inherently incur error $\Omega(\epsilon \sqrt{\log(1/\epsilon)})$. At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.} }
Endnote
%0 Conference Paper %T Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination %A Ilias Diakonikolas %A Daniel Kane %A Sushrut Karmalkar %A Ankit Pensia %A Thanasis Pittas %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-diakonikolas24a %I PMLR %P 10811--10840 %U https://proceedings.mlr.press/v235/diakonikolas24a.html %V 235 %X We study Gaussian sparse estimation tasks in Huber’s contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian robust $k$-sparse mean estimation on $\mathbb{R}^d$ with corruption rate $\epsilon>0$, our algorithm has sample complexity $(k^2/\epsilon ^2)\mathrm{polylog}(d/\epsilon)$, runs in sample polynomial time, and approximates the target mean within $\ell_2$-error $O(\epsilon)$. Previous efficient algorithms inherently incur error $\Omega(\epsilon \sqrt{\log(1/\epsilon)})$. At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.
APA
Diakonikolas, I., Kane, D., Karmalkar, S., Pensia, A. & Pittas, T.. (2024). Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:10811-10840 Available from https://proceedings.mlr.press/v235/diakonikolas24a.html.

Related Material