Memory and Computation Efficient PCA via Very Sparse Random Projections

Farhad Pourkamali Anaraki, Shannon Hughes
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1341-1349, 2014.

Abstract

Algorithms that can efficiently recover principal components in very high-dimensional, streaming, and/or distributed data settings have become an important topic in the literature. In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. Indeed, our approach is simultaneously efficient in memory/storage space, efficient in computation, and produces accurate PC estimates, while also allowing for rigorous theoretical performance analysis. Moreover, one can tune the sparsity of the random vectors deliberately to achieve a desired point on the tradeoffs between memory, computation, and accuracy. We rigorously characterize these tradeoffs and provide statistical performance guarantees. In addition to these very sparse random vectors, our analysis also applies to more general random projections. We present experimental results demonstrating that this approach allows for simultaneously achieving a substantial reduction of the computational complexity and memory/storage space, with little loss in accuracy, particularly for very high-dimensional data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-anaraki14, title = {Memory and Computation Efficient PCA via Very Sparse Random Projections}, author = {Anaraki, Farhad Pourkamali and Hughes, Shannon}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {1341--1349}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/anaraki14.pdf}, url = {https://proceedings.mlr.press/v32/anaraki14.html}, abstract = {Algorithms that can efficiently recover principal components in very high-dimensional, streaming, and/or distributed data settings have become an important topic in the literature. In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. Indeed, our approach is simultaneously efficient in memory/storage space, efficient in computation, and produces accurate PC estimates, while also allowing for rigorous theoretical performance analysis. Moreover, one can tune the sparsity of the random vectors deliberately to achieve a desired point on the tradeoffs between memory, computation, and accuracy. We rigorously characterize these tradeoffs and provide statistical performance guarantees. In addition to these very sparse random vectors, our analysis also applies to more general random projections. We present experimental results demonstrating that this approach allows for simultaneously achieving a substantial reduction of the computational complexity and memory/storage space, with little loss in accuracy, particularly for very high-dimensional data.} }
Endnote
%0 Conference Paper %T Memory and Computation Efficient PCA via Very Sparse Random Projections %A Farhad Pourkamali Anaraki %A Shannon Hughes %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-anaraki14 %I PMLR %P 1341--1349 %U https://proceedings.mlr.press/v32/anaraki14.html %V 32 %N 2 %X Algorithms that can efficiently recover principal components in very high-dimensional, streaming, and/or distributed data settings have become an important topic in the literature. In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. Indeed, our approach is simultaneously efficient in memory/storage space, efficient in computation, and produces accurate PC estimates, while also allowing for rigorous theoretical performance analysis. Moreover, one can tune the sparsity of the random vectors deliberately to achieve a desired point on the tradeoffs between memory, computation, and accuracy. We rigorously characterize these tradeoffs and provide statistical performance guarantees. In addition to these very sparse random vectors, our analysis also applies to more general random projections. We present experimental results demonstrating that this approach allows for simultaneously achieving a substantial reduction of the computational complexity and memory/storage space, with little loss in accuracy, particularly for very high-dimensional data.
RIS
TY - CPAPER TI - Memory and Computation Efficient PCA via Very Sparse Random Projections AU - Farhad Pourkamali Anaraki AU - Shannon Hughes BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-anaraki14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 1341 EP - 1349 L1 - http://proceedings.mlr.press/v32/anaraki14.pdf UR - https://proceedings.mlr.press/v32/anaraki14.html AB - Algorithms that can efficiently recover principal components in very high-dimensional, streaming, and/or distributed data settings have become an important topic in the literature. In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. Indeed, our approach is simultaneously efficient in memory/storage space, efficient in computation, and produces accurate PC estimates, while also allowing for rigorous theoretical performance analysis. Moreover, one can tune the sparsity of the random vectors deliberately to achieve a desired point on the tradeoffs between memory, computation, and accuracy. We rigorously characterize these tradeoffs and provide statistical performance guarantees. In addition to these very sparse random vectors, our analysis also applies to more general random projections. We present experimental results demonstrating that this approach allows for simultaneously achieving a substantial reduction of the computational complexity and memory/storage space, with little loss in accuracy, particularly for very high-dimensional data. ER -
APA
Anaraki, F.P. & Hughes, S.. (2014). Memory and Computation Efficient PCA via Very Sparse Random Projections. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1341-1349 Available from https://proceedings.mlr.press/v32/anaraki14.html.

Related Material