A Subquadratic Time Algorithm for Robust Sparse Mean Estimation

Ankit Pensia
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:40371-40396, 2024.

Abstract

We study the algorithmic problem of sparse mean estimation in the presence of adversarial outliers. Specifically, the algorithm observes a corrupted set of samples from $\mathcal{N}(\mu,\mathbf{I}_d)$, where the unknown mean $\mu \in \mathbb{R}^d$ is constrained to be $k$-sparse. A series of prior works has developed efficient algorithms for robust sparse mean estimation with sample complexity $\mathrm{poly}(k,\log d, 1/\epsilon)$ and runtime $d^2 \mathrm{poly}(k,\log d,1/\epsilon)$, where $\epsilon$ is the fraction of contamination. In particular, the fastest runtime of existing algorithms is quadratic in the dimension, which can be prohibitive in high dimensions. This quadratic barrier in the runtime stems from the reliance of these algorithms on the sample covariance matrix, which is of size $d^2$. Our main contribution is an algorithm for robust sparse mean estimation which runs in subquadratic time using $\mathrm{poly}(k,\log d,1/\epsilon)$ samples. Our results build on algorithmic advances in detecting weak correlations, a generalized version of the light-bulb problem by Valiant (2015).

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-pensia24a, title = {A Subquadratic Time Algorithm for Robust Sparse Mean Estimation}, author = {Pensia, Ankit}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {40371--40396}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/pensia24a/pensia24a.pdf}, url = {https://proceedings.mlr.press/v235/pensia24a.html}, abstract = {We study the algorithmic problem of sparse mean estimation in the presence of adversarial outliers. Specifically, the algorithm observes a corrupted set of samples from $\mathcal{N}(\mu,\mathbf{I}_d)$, where the unknown mean $\mu \in \mathbb{R}^d$ is constrained to be $k$-sparse. A series of prior works has developed efficient algorithms for robust sparse mean estimation with sample complexity $\mathrm{poly}(k,\log d, 1/\epsilon)$ and runtime $d^2 \mathrm{poly}(k,\log d,1/\epsilon)$, where $\epsilon$ is the fraction of contamination. In particular, the fastest runtime of existing algorithms is quadratic in the dimension, which can be prohibitive in high dimensions. This quadratic barrier in the runtime stems from the reliance of these algorithms on the sample covariance matrix, which is of size $d^2$. Our main contribution is an algorithm for robust sparse mean estimation which runs in subquadratic time using $\mathrm{poly}(k,\log d,1/\epsilon)$ samples. Our results build on algorithmic advances in detecting weak correlations, a generalized version of the light-bulb problem by Valiant (2015).} }
Endnote
%0 Conference Paper %T A Subquadratic Time Algorithm for Robust Sparse Mean Estimation %A Ankit Pensia %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-pensia24a %I PMLR %P 40371--40396 %U https://proceedings.mlr.press/v235/pensia24a.html %V 235 %X We study the algorithmic problem of sparse mean estimation in the presence of adversarial outliers. Specifically, the algorithm observes a corrupted set of samples from $\mathcal{N}(\mu,\mathbf{I}_d)$, where the unknown mean $\mu \in \mathbb{R}^d$ is constrained to be $k$-sparse. A series of prior works has developed efficient algorithms for robust sparse mean estimation with sample complexity $\mathrm{poly}(k,\log d, 1/\epsilon)$ and runtime $d^2 \mathrm{poly}(k,\log d,1/\epsilon)$, where $\epsilon$ is the fraction of contamination. In particular, the fastest runtime of existing algorithms is quadratic in the dimension, which can be prohibitive in high dimensions. This quadratic barrier in the runtime stems from the reliance of these algorithms on the sample covariance matrix, which is of size $d^2$. Our main contribution is an algorithm for robust sparse mean estimation which runs in subquadratic time using $\mathrm{poly}(k,\log d,1/\epsilon)$ samples. Our results build on algorithmic advances in detecting weak correlations, a generalized version of the light-bulb problem by Valiant (2015).
APA
Pensia, A.. (2024). A Subquadratic Time Algorithm for Robust Sparse Mean Estimation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:40371-40396 Available from https://proceedings.mlr.press/v235/pensia24a.html.

Related Material