Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis

Sangil Han; Sungkyu Jung; Kyoowon Kim

Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis

Sangil Han, Sungkyu Jung, Kyoowon Kim

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1765-1773, 2024.

Abstract

The singular value decomposition (SVD) is a crucial tool in machine learning and statistical data analysis. However, it is highly susceptible to outliers in the data matrix. Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation that is highly insensitive to outliers, computationally scalable, and provides accurate approximations of singular vectors. The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm to appropriately scaled data, significantly outperforming competing algorithms in computation times. To assess the robustness of the approximated singular vectors and their subspaces against data contamination, we introduce new notions of breakdown points for matrix-valued input, including row-wise, column-wise, and block-wise breakdown points. Theoretical and empirical analyses demonstrate that our algorithm exhibits higher breakdown points compared to standard SVD and its modifications. We empirically validate the effectiveness of our approach in applications such as robust low-rank approximation and robust principal component analysis of high-dimensional microarray datasets. Overall, our study presents a highly efficient and robust solution for SVD approximation that overcomes the limitations of existing algorithms in the presence of outliers.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-han24a,
  title = 	 {Robust {SVD} Made Easy: A fast and reliable algorithm for large-scale data analysis},
  author =       {Han, Sangil and Jung, Sungkyu and Kim, Kyoowon},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1765--1773},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/han24a/han24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/han24a.html},
  abstract = 	 {The singular value decomposition (SVD) is a crucial tool in machine learning and statistical data analysis. However, it is highly susceptible to outliers in the data matrix. Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation that is highly insensitive to outliers, computationally scalable, and provides accurate approximations of singular vectors. The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm to appropriately scaled data, significantly outperforming competing algorithms in computation times. To assess the robustness of the approximated singular vectors and their subspaces against data contamination, we introduce new notions of breakdown points for matrix-valued input, including row-wise, column-wise, and block-wise breakdown points. Theoretical and empirical analyses demonstrate that our algorithm exhibits higher breakdown points compared to standard SVD and its modifications. We empirically validate the effectiveness of our approach in applications such as robust low-rank approximation and robust principal component analysis of high-dimensional microarray datasets. Overall, our study presents a highly efficient and robust solution for SVD approximation that overcomes the limitations of existing algorithms in the presence of outliers.}
}

Endnote

%0 Conference Paper
%T Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis
%A Sangil Han
%A Sungkyu Jung
%A Kyoowon Kim
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-han24a
%I PMLR
%P 1765--1773
%U https://proceedings.mlr.press/v238/han24a.html
%V 238
%X The singular value decomposition (SVD) is a crucial tool in machine learning and statistical data analysis. However, it is highly susceptible to outliers in the data matrix. Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation that is highly insensitive to outliers, computationally scalable, and provides accurate approximations of singular vectors. The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm to appropriately scaled data, significantly outperforming competing algorithms in computation times. To assess the robustness of the approximated singular vectors and their subspaces against data contamination, we introduce new notions of breakdown points for matrix-valued input, including row-wise, column-wise, and block-wise breakdown points. Theoretical and empirical analyses demonstrate that our algorithm exhibits higher breakdown points compared to standard SVD and its modifications. We empirically validate the effectiveness of our approach in applications such as robust low-rank approximation and robust principal component analysis of high-dimensional microarray datasets. Overall, our study presents a highly efficient and robust solution for SVD approximation that overcomes the limitations of existing algorithms in the presence of outliers.

APA

Han, S., Jung, S. & Kim, K.. (2024). Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1765-1773 Available from https://proceedings.mlr.press/v238/han24a.html.

Related Material

Download PDF