Streaming Sparse Principal Component Analysis

Wenzhuo Yang, Huan Xu
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:494-503, 2015.

Abstract

This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-yangd15, title = {Streaming Sparse Principal Component Analysis}, author = {Yang, Wenzhuo and Xu, Huan}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {494--503}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/yangd15.pdf}, url = {https://proceedings.mlr.press/v37/yangd15.html}, abstract = {This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms.} }
Endnote
%0 Conference Paper %T Streaming Sparse Principal Component Analysis %A Wenzhuo Yang %A Huan Xu %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-yangd15 %I PMLR %P 494--503 %U https://proceedings.mlr.press/v37/yangd15.html %V 37 %X This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms.
RIS
TY - CPAPER TI - Streaming Sparse Principal Component Analysis AU - Wenzhuo Yang AU - Huan Xu BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-yangd15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 494 EP - 503 L1 - http://proceedings.mlr.press/v37/yangd15.pdf UR - https://proceedings.mlr.press/v37/yangd15.html AB - This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms. ER -
APA
Yang, W. & Xu, H.. (2015). Streaming Sparse Principal Component Analysis. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:494-503 Available from https://proceedings.mlr.press/v37/yangd15.html.

Related Material