Residual-Based Sampling for Online Outlier-Robust PCA

Tianhao Zhu, Jie Shen
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:27591-27611, 2022.

Abstract

Outlier-robust principal component analysis (ORPCA) has been broadly applied in scientific discovery in the last decades. In this paper, we study online ORPCA, an important variant that addresses the practical challenge that the data points arrive in a sequential manner and the goal is to recover the underlying subspace of the clean data with one pass of the data. Our main contribution is the first provable algorithm that enjoys comparable recovery guarantee to the best known batch algorithm, while significantly improving upon the state-of-the-art online ORPCA algorithms. The core technique is a robust version of the residual norm which, informally speaking, leverages not only the importance of a data point, but also how likely it behaves as an outlier.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-zhu22i, title = {Residual-Based Sampling for Online Outlier-Robust {PCA}}, author = {Zhu, Tianhao and Shen, Jie}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {27591--27611}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/zhu22i/zhu22i.pdf}, url = {https://proceedings.mlr.press/v162/zhu22i.html}, abstract = {Outlier-robust principal component analysis (ORPCA) has been broadly applied in scientific discovery in the last decades. In this paper, we study online ORPCA, an important variant that addresses the practical challenge that the data points arrive in a sequential manner and the goal is to recover the underlying subspace of the clean data with one pass of the data. Our main contribution is the first provable algorithm that enjoys comparable recovery guarantee to the best known batch algorithm, while significantly improving upon the state-of-the-art online ORPCA algorithms. The core technique is a robust version of the residual norm which, informally speaking, leverages not only the importance of a data point, but also how likely it behaves as an outlier.} }
Endnote
%0 Conference Paper %T Residual-Based Sampling for Online Outlier-Robust PCA %A Tianhao Zhu %A Jie Shen %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-zhu22i %I PMLR %P 27591--27611 %U https://proceedings.mlr.press/v162/zhu22i.html %V 162 %X Outlier-robust principal component analysis (ORPCA) has been broadly applied in scientific discovery in the last decades. In this paper, we study online ORPCA, an important variant that addresses the practical challenge that the data points arrive in a sequential manner and the goal is to recover the underlying subspace of the clean data with one pass of the data. Our main contribution is the first provable algorithm that enjoys comparable recovery guarantee to the best known batch algorithm, while significantly improving upon the state-of-the-art online ORPCA algorithms. The core technique is a robust version of the residual norm which, informally speaking, leverages not only the importance of a data point, but also how likely it behaves as an outlier.
APA
Zhu, T. & Shen, J.. (2022). Residual-Based Sampling for Online Outlier-Robust PCA. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:27591-27611 Available from https://proceedings.mlr.press/v162/zhu22i.html.

Related Material