Fast and Provable Algorithms for Sparse PCA with Improved Sample Complexity

Jian-Feng Cai, Zhuozhi Xian, Jiaxi Ying
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:6319-6340, 2025.

Abstract

We explore the single-spiked covariance model within the context of sparse principal component analysis (PCA), which aims to recover a sparse unit vector from noisy samples. From an information-theoretic perspective, $O(k \log p)$ observations are sufficient to recover a $k$-sparse $p$-dimensional vector $\mathbf{v}$. However, existing polynomial-time methods require at least $O(k^2)$ samples for successful recovery, highlighting a significant gap in sample efficiency. To bridge this gap, we introduce a novel thresholding-based algorithm that requires only $\Omega(k \log p)$ samples, provided the signal strength $\lambda = \Omega(||\mathbf{v}||_\infty^{-1})$. We also propose a two-stage nonconvex algorithm that further enhances estimation performance. This approach integrates our thresholding algorithm with truncated power iteration, achieving the minimax optimal rate of statistical error under the desired sample complexity. Numerical experiments validate the superior performance of our algorithms in terms of estimation accuracy and computational efficiency.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-cai25h, title = {Fast and Provable Algorithms for Sparse {PCA} with Improved Sample Complexity}, author = {Cai, Jian-Feng and Xian, Zhuozhi and Ying, Jiaxi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {6319--6340}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/cai25h/cai25h.pdf}, url = {https://proceedings.mlr.press/v267/cai25h.html}, abstract = {We explore the single-spiked covariance model within the context of sparse principal component analysis (PCA), which aims to recover a sparse unit vector from noisy samples. From an information-theoretic perspective, $O(k \log p)$ observations are sufficient to recover a $k$-sparse $p$-dimensional vector $\mathbf{v}$. However, existing polynomial-time methods require at least $O(k^2)$ samples for successful recovery, highlighting a significant gap in sample efficiency. To bridge this gap, we introduce a novel thresholding-based algorithm that requires only $\Omega(k \log p)$ samples, provided the signal strength $\lambda = \Omega(||\mathbf{v}||_\infty^{-1})$. We also propose a two-stage nonconvex algorithm that further enhances estimation performance. This approach integrates our thresholding algorithm with truncated power iteration, achieving the minimax optimal rate of statistical error under the desired sample complexity. Numerical experiments validate the superior performance of our algorithms in terms of estimation accuracy and computational efficiency.} }
Endnote
%0 Conference Paper %T Fast and Provable Algorithms for Sparse PCA with Improved Sample Complexity %A Jian-Feng Cai %A Zhuozhi Xian %A Jiaxi Ying %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-cai25h %I PMLR %P 6319--6340 %U https://proceedings.mlr.press/v267/cai25h.html %V 267 %X We explore the single-spiked covariance model within the context of sparse principal component analysis (PCA), which aims to recover a sparse unit vector from noisy samples. From an information-theoretic perspective, $O(k \log p)$ observations are sufficient to recover a $k$-sparse $p$-dimensional vector $\mathbf{v}$. However, existing polynomial-time methods require at least $O(k^2)$ samples for successful recovery, highlighting a significant gap in sample efficiency. To bridge this gap, we introduce a novel thresholding-based algorithm that requires only $\Omega(k \log p)$ samples, provided the signal strength $\lambda = \Omega(||\mathbf{v}||_\infty^{-1})$. We also propose a two-stage nonconvex algorithm that further enhances estimation performance. This approach integrates our thresholding algorithm with truncated power iteration, achieving the minimax optimal rate of statistical error under the desired sample complexity. Numerical experiments validate the superior performance of our algorithms in terms of estimation accuracy and computational efficiency.
APA
Cai, J., Xian, Z. & Ying, J.. (2025). Fast and Provable Algorithms for Sparse PCA with Improved Sample Complexity. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:6319-6340 Available from https://proceedings.mlr.press/v267/cai25h.html.

Related Material