Streaming Kernel PCA Algorithm With Small Space

Yichuan Deng, Jiangxuan Long, Zhao Song, Zifan Wang, Han Zhang
Conference on Parsimony and Learning, PMLR 280:1216-1254, 2025.

Abstract

Principal Component Analysis (PCA) is a widely used technique in machine learning, data analysis, and signal processing. With the increase in the size and complexity of datasets, it has become essential to develop low-space usage algorithms for PCA. Streaming PCA has gained significant attention in recent years, as it can handle large datasets efficiently. The kernel method, commonly used in learning algorithms such as Support Vector Machines (SVMs), has also been applied in PCA algorithms. We propose a streaming algorithm for Kernel PCA problems based on the traditional scheme by Oja. Our algorithm addresses the challenge of reducing the memory usage of PCA while maintaining its accuracy. We analyze the performance of our algorithm by studying the conditions under which it succeeds. Specifically, we show that when the spectral ratio $R:= \lambda_1/\lambda_2$ of the target covariance matrix is $\Omega( \log n\cdot \log d)$, the streaming PCA can be solved with linear space cost. However, the standard PCA algorithm usually requires quadratic space due to matrix vector multiplication. Our proposed algorithm has several advantages over existing methods. First, it is a streaming algorithm that can handle large datasets efficiently. Second, it employs the kernel method, which allows it to capture complex nonlinear relationships among data points. Third, it has a low-space usage, making it suitable for limited memory applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v280-deng25a, title = {Streaming Kernel PCA Algorithm With Small Space}, author = {Deng, Yichuan and Long, Jiangxuan and Song, Zhao and Wang, Zifan and Zhang, Han}, booktitle = {Conference on Parsimony and Learning}, pages = {1216--1254}, year = {2025}, editor = {Chen, Beidi and Liu, Shijia and Pilanci, Mert and Su, Weijie and Sulam, Jeremias and Wang, Yuxiang and Zhu, Zhihui}, volume = {280}, series = {Proceedings of Machine Learning Research}, month = {24--27 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v280/main/assets/deng25a/deng25a.pdf}, url = {https://proceedings.mlr.press/v280/deng25a.html}, abstract = {Principal Component Analysis (PCA) is a widely used technique in machine learning, data analysis, and signal processing. With the increase in the size and complexity of datasets, it has become essential to develop low-space usage algorithms for PCA. Streaming PCA has gained significant attention in recent years, as it can handle large datasets efficiently. The kernel method, commonly used in learning algorithms such as Support Vector Machines (SVMs), has also been applied in PCA algorithms. We propose a streaming algorithm for Kernel PCA problems based on the traditional scheme by Oja. Our algorithm addresses the challenge of reducing the memory usage of PCA while maintaining its accuracy. We analyze the performance of our algorithm by studying the conditions under which it succeeds. Specifically, we show that when the spectral ratio $R:= \lambda_1/\lambda_2$ of the target covariance matrix is $\Omega( \log n\cdot \log d)$, the streaming PCA can be solved with linear space cost. However, the standard PCA algorithm usually requires quadratic space due to matrix vector multiplication. Our proposed algorithm has several advantages over existing methods. First, it is a streaming algorithm that can handle large datasets efficiently. Second, it employs the kernel method, which allows it to capture complex nonlinear relationships among data points. Third, it has a low-space usage, making it suitable for limited memory applications.} }
Endnote
%0 Conference Paper %T Streaming Kernel PCA Algorithm With Small Space %A Yichuan Deng %A Jiangxuan Long %A Zhao Song %A Zifan Wang %A Han Zhang %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2025 %E Beidi Chen %E Shijia Liu %E Mert Pilanci %E Weijie Su %E Jeremias Sulam %E Yuxiang Wang %E Zhihui Zhu %F pmlr-v280-deng25a %I PMLR %P 1216--1254 %U https://proceedings.mlr.press/v280/deng25a.html %V 280 %X Principal Component Analysis (PCA) is a widely used technique in machine learning, data analysis, and signal processing. With the increase in the size and complexity of datasets, it has become essential to develop low-space usage algorithms for PCA. Streaming PCA has gained significant attention in recent years, as it can handle large datasets efficiently. The kernel method, commonly used in learning algorithms such as Support Vector Machines (SVMs), has also been applied in PCA algorithms. We propose a streaming algorithm for Kernel PCA problems based on the traditional scheme by Oja. Our algorithm addresses the challenge of reducing the memory usage of PCA while maintaining its accuracy. We analyze the performance of our algorithm by studying the conditions under which it succeeds. Specifically, we show that when the spectral ratio $R:= \lambda_1/\lambda_2$ of the target covariance matrix is $\Omega( \log n\cdot \log d)$, the streaming PCA can be solved with linear space cost. However, the standard PCA algorithm usually requires quadratic space due to matrix vector multiplication. Our proposed algorithm has several advantages over existing methods. First, it is a streaming algorithm that can handle large datasets efficiently. Second, it employs the kernel method, which allows it to capture complex nonlinear relationships among data points. Third, it has a low-space usage, making it suitable for limited memory applications.
APA
Deng, Y., Long, J., Song, Z., Wang, Z. & Zhang, H.. (2025). Streaming Kernel PCA Algorithm With Small Space. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 280:1216-1254 Available from https://proceedings.mlr.press/v280/deng25a.html.

Related Material