Toward Efficient and Accurate Covariance Matrix Estimation on Compressed Data

Xixian Chen, Michael R. Lyu, Irwin King
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:767-776, 2017.

Abstract

Estimating covariance matrices is a fundamental technique in various domains, most notably in machine learning and signal processing. To tackle the challenges of extensive communication costs, large storage capacity requirements, and high processing time complexity when handling massive high-dimensional and distributed data, we propose an efficient and accurate covariance matrix estimation method via data compression. In contrast to previous data-oblivious compression schemes, we leverage a data-aware weighted sampling method to construct low-dimensional data for such estimation. We rigorously prove that our proposed estimator is unbiased and requires smaller data to achieve the same accuracy with specially designed sampling distributions. Besides, we depict that the computational procedures in our algorithm are efficient. All achievements imply an improved tradeoff between the estimation accuracy and computational costs. Finally, the extensive experiments on synthetic and real-world datasets validate the superior property of our method and illustrate that it significantly outperforms the state-of-the-art algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-chen17g, title = {Toward Efficient and Accurate Covariance Matrix Estimation on Compressed Data}, author = {Xixian Chen and Michael R. Lyu and Irwin King}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {767--776}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/chen17g/chen17g.pdf}, url = {https://proceedings.mlr.press/v70/chen17g.html}, abstract = {Estimating covariance matrices is a fundamental technique in various domains, most notably in machine learning and signal processing. To tackle the challenges of extensive communication costs, large storage capacity requirements, and high processing time complexity when handling massive high-dimensional and distributed data, we propose an efficient and accurate covariance matrix estimation method via data compression. In contrast to previous data-oblivious compression schemes, we leverage a data-aware weighted sampling method to construct low-dimensional data for such estimation. We rigorously prove that our proposed estimator is unbiased and requires smaller data to achieve the same accuracy with specially designed sampling distributions. Besides, we depict that the computational procedures in our algorithm are efficient. All achievements imply an improved tradeoff between the estimation accuracy and computational costs. Finally, the extensive experiments on synthetic and real-world datasets validate the superior property of our method and illustrate that it significantly outperforms the state-of-the-art algorithms.} }
Endnote
%0 Conference Paper %T Toward Efficient and Accurate Covariance Matrix Estimation on Compressed Data %A Xixian Chen %A Michael R. Lyu %A Irwin King %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-chen17g %I PMLR %P 767--776 %U https://proceedings.mlr.press/v70/chen17g.html %V 70 %X Estimating covariance matrices is a fundamental technique in various domains, most notably in machine learning and signal processing. To tackle the challenges of extensive communication costs, large storage capacity requirements, and high processing time complexity when handling massive high-dimensional and distributed data, we propose an efficient and accurate covariance matrix estimation method via data compression. In contrast to previous data-oblivious compression schemes, we leverage a data-aware weighted sampling method to construct low-dimensional data for such estimation. We rigorously prove that our proposed estimator is unbiased and requires smaller data to achieve the same accuracy with specially designed sampling distributions. Besides, we depict that the computational procedures in our algorithm are efficient. All achievements imply an improved tradeoff between the estimation accuracy and computational costs. Finally, the extensive experiments on synthetic and real-world datasets validate the superior property of our method and illustrate that it significantly outperforms the state-of-the-art algorithms.
APA
Chen, X., Lyu, M.R. & King, I.. (2017). Toward Efficient and Accurate Covariance Matrix Estimation on Compressed Data. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:767-776 Available from https://proceedings.mlr.press/v70/chen17g.html.

Related Material