Thresholding Based Outlier Robust PCA

Yeshwanth Cherapanamjeri; Prateek Jain; Praneeth Netrapalli

Thresholding Based Outlier Robust PCA

Yeshwanth Cherapanamjeri, Prateek Jain, Praneeth Netrapalli

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:593-628, 2017.

Abstract

We consider the problem of outlier robust PCA (\textbfOR-PCA) where the goal is to recover principal directions despite the presence of outlier data points. That is, given a data matrix $M^*$, where $(1-α)$ fraction of the points are noisy samples from a low-dimensional subspace while $α$ fraction of the points can be arbitrary outliers, the goal is to recover the subspace accurately. Existing results for \textbfOR-PCA have serious drawbacks: while some results are quite weak in the presence of noise, other results have runtime quadratic in dimension, rendering them impractical for large scale applications. In this work, we provide a novel thresholding based iterative algorithm with per-iteration complexity at most linear in the data size. Moreover, the fraction of outliers, $α$, that our method can handle is tight up to constants while providing nearly optimal computational complexity for a general noise setting. For the special case where the inliers are obtained from a low-dimensional subspace with additive Gaussian noise, we show that a modification of our thresholding based method leads to significant improvement in recovery error (of the subspace) even in the presence of a large fraction of outliers.

Cite this Paper

BibTeX


@InProceedings{pmlr-v65-cherapanamjeri17a,
  title = 	 {Thresholding Based Outlier Robust PCA},
  author = 	 {Cherapanamjeri, Yeshwanth and Jain, Prateek and Netrapalli, Praneeth},
  booktitle = 	 {Proceedings of the 2017 Conference on Learning Theory},
  pages = 	 {593--628},
  year = 	 {2017},
  editor = 	 {Kale, Satyen and Shamir, Ohad},
  volume = 	 {65},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v65/cherapanamjeri17a/cherapanamjeri17a.pdf},
  url = 	 {https://proceedings.mlr.press/v65/cherapanamjeri17a.html},
  abstract = 	 {We consider the problem of outlier robust PCA (\textbfOR-PCA) where the goal is to recover principal directions despite the presence of outlier data points. That is, given a data matrix $M^*$, where $(1-α)$ fraction of the points are noisy samples from a low-dimensional subspace while $α$ fraction of the points can be arbitrary outliers, the goal is to recover the subspace accurately. Existing results for \textbfOR-PCA have serious drawbacks: while some results are quite weak in the presence of noise, other results have runtime quadratic in dimension, rendering them impractical for large scale applications. In this work, we provide a novel thresholding based iterative algorithm with per-iteration complexity at most linear in the data size. Moreover, the fraction of outliers, $α$, that our method can handle is tight up to constants while providing nearly optimal computational complexity for a general noise setting. For the special case where the inliers are obtained from a low-dimensional subspace with additive Gaussian noise, we show that a modification of our thresholding based method leads to significant improvement in recovery error (of the subspace) even in the presence of a large fraction of outliers.}
}

Endnote

%0 Conference Paper
%T Thresholding Based Outlier Robust PCA
%A Yeshwanth Cherapanamjeri
%A Prateek Jain
%A Praneeth Netrapalli
%B Proceedings of the 2017 Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Satyen Kale
%E Ohad Shamir	
%F pmlr-v65-cherapanamjeri17a
%I PMLR
%P 593--628
%U https://proceedings.mlr.press/v65/cherapanamjeri17a.html
%V 65
%X We consider the problem of outlier robust PCA (\textbfOR-PCA) where the goal is to recover principal directions despite the presence of outlier data points. That is, given a data matrix $M^*$, where $(1-α)$ fraction of the points are noisy samples from a low-dimensional subspace while $α$ fraction of the points can be arbitrary outliers, the goal is to recover the subspace accurately. Existing results for \textbfOR-PCA have serious drawbacks: while some results are quite weak in the presence of noise, other results have runtime quadratic in dimension, rendering them impractical for large scale applications. In this work, we provide a novel thresholding based iterative algorithm with per-iteration complexity at most linear in the data size. Moreover, the fraction of outliers, $α$, that our method can handle is tight up to constants while providing nearly optimal computational complexity for a general noise setting. For the special case where the inliers are obtained from a low-dimensional subspace with additive Gaussian noise, we show that a modification of our thresholding based method leads to significant improvement in recovery error (of the subspace) even in the presence of a large fraction of outliers.

APA


Cherapanamjeri, Y., Jain, P. & Netrapalli, P.. (2017). Thresholding Based Outlier Robust PCA. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:593-628 Available from https://proceedings.mlr.press/v65/cherapanamjeri17a.html.

Related Material

Download PDF