Large-Scale Sparse Inverse Covariance Estimation via Thresholding and Max-Det Matrix Completion

Richard Zhang, Salar Fattahi, Somayeh Sojoudi
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5766-5775, 2018.

Abstract

The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as “graphical lasso”, but its computational cost becomes prohibitive for large data sets. A recently line of results showed{–}under mild assumptions{–}that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $\epsilon$-accurate solution in $O(n\log(1/\epsilon))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-zhang18c, title = {Large-Scale Sparse Inverse Covariance Estimation via Thresholding and Max-Det Matrix Completion}, author = {Zhang, Richard and Fattahi, Salar and Sojoudi, Somayeh}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {5766--5775}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/zhang18c/zhang18c.pdf}, url = {https://proceedings.mlr.press/v80/zhang18c.html}, abstract = {The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as “graphical lasso”, but its computational cost becomes prohibitive for large data sets. A recently line of results showed{–}under mild assumptions{–}that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $\epsilon$-accurate solution in $O(n\log(1/\epsilon))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.} }
Endnote
%0 Conference Paper %T Large-Scale Sparse Inverse Covariance Estimation via Thresholding and Max-Det Matrix Completion %A Richard Zhang %A Salar Fattahi %A Somayeh Sojoudi %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-zhang18c %I PMLR %P 5766--5775 %U https://proceedings.mlr.press/v80/zhang18c.html %V 80 %X The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as “graphical lasso”, but its computational cost becomes prohibitive for large data sets. A recently line of results showed{–}under mild assumptions{–}that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $\epsilon$-accurate solution in $O(n\log(1/\epsilon))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.
APA
Zhang, R., Fattahi, S. & Sojoudi, S.. (2018). Large-Scale Sparse Inverse Covariance Estimation via Thresholding and Max-Det Matrix Completion. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5766-5775 Available from https://proceedings.mlr.press/v80/zhang18c.html.

Related Material