Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA

Ilias Diakonikolas; Daniel Kane; Ankit Pensia; Thanasis Pittas

Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA

Ilias Diakonikolas, Daniel Kane, Ankit Pensia, Thanasis Pittas

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:7886-7921, 2023.

Abstract

We study principal component analysis (PCA), where given a dataset in $\mathbb R^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA. Recent work has developed computationally-efficient algorithms for robust PCA that either take super-linear time or have sub-optimal error guarantees. Our main contribution is to develop a nearly linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-diakonikolas23a,
  title = 	 {Nearly-Linear Time and Streaming Algorithms for Outlier-Robust {PCA}},
  author =       {Diakonikolas, Ilias and Kane, Daniel and Pensia, Ankit and Pittas, Thanasis},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {7886--7921},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/diakonikolas23a/diakonikolas23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/diakonikolas23a.html},
  abstract = 	 {We study principal component analysis (PCA), where given a dataset in $\mathbb R^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA. Recent work has developed computationally-efficient algorithms for robust PCA that either take super-linear time or have sub-optimal error guarantees. Our main contribution is to develop a nearly linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.}
}

Endnote

%0 Conference Paper
%T Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA
%A Ilias Diakonikolas
%A Daniel Kane
%A Ankit Pensia
%A Thanasis Pittas
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-diakonikolas23a
%I PMLR
%P 7886--7921
%U https://proceedings.mlr.press/v202/diakonikolas23a.html
%V 202
%X We study principal component analysis (PCA), where given a dataset in $\mathbb R^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA. Recent work has developed computationally-efficient algorithms for robust PCA that either take super-linear time or have sub-optimal error guarantees. Our main contribution is to develop a nearly linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.

APA

Diakonikolas, I., Kane, D., Pensia, A. & Pittas, T.. (2023). Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:7886-7921 Available from https://proceedings.mlr.press/v202/diakonikolas23a.html.

Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA

Abstract

Cite this Paper

Related Material