Streaming Sparse Principal Component Analysis

Wenzhuo Yang; Huan Xu

Streaming Sparse Principal Component Analysis

Wenzhuo Yang, Huan Xu

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:494-503, 2015.

Abstract

This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-yangd15,
  title = 	 {Streaming Sparse Principal Component Analysis},
  author = 	 {Yang, Wenzhuo and Xu, Huan},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {494--503},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/yangd15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/yangd15.html},
  abstract = 	 {This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms.}
}

Endnote

%0 Conference Paper
%T Streaming Sparse Principal Component Analysis
%A Wenzhuo Yang
%A Huan Xu
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-yangd15
%I PMLR
%P 494--503
%U https://proceedings.mlr.press/v37/yangd15.html
%V 37
%X This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms.

RIS


TY  - CPAPER
TI  - Streaming Sparse Principal Component Analysis
AU  - Wenzhuo Yang
AU  - Huan Xu
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-yangd15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 494
EP  - 503
L1  - http://proceedings.mlr.press/v37/yangd15.pdf
UR  - https://proceedings.mlr.press/v37/yangd15.html
AB  - This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms.
ER  -

APA


Yang, W. & Xu, H.. (2015). Streaming Sparse Principal Component Analysis. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:494-503 Available from https://proceedings.mlr.press/v37/yangd15.html.

Streaming Sparse Principal Component Analysis

Abstract

Cite this Paper

Related Material