Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation

Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydin Buluc, Dmitriy Morozov, Leonid Oliker, Katherine Yelick, Sang-Yun Oh
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1376-1386, 2018.

Abstract

Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal gradient method uses a novel communication-avoiding linear algebra algorithm and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving parallel scalability on problems with up to ≈819 billion parameters (1.28 million dimensions); even on a single node, HP-CONCORD demonstrates scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to estimate the underlying dependency structure of the brain from fMRI data, and use the result to identify functional regions automatically. The results show good agreement with a clustering from the neuroscience literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-koanantakool18a, title = {Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation}, author = {Penporn Koanantakool and Alnur Ali and Ariful Azad and Aydin Buluc and Dmitriy Morozov and Leonid Oliker and Katherine Yelick and Sang-Yun Oh}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1376--1386}, year = {2018}, editor = {Amos Storkey and Fernando Perez-Cruz}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/koanantakool18a/koanantakool18a.pdf}, url = { http://proceedings.mlr.press/v84/koanantakool18a.html }, abstract = {Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal gradient method uses a novel communication-avoiding linear algebra algorithm and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving parallel scalability on problems with up to ≈819 billion parameters (1.28 million dimensions); even on a single node, HP-CONCORD demonstrates scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to estimate the underlying dependency structure of the brain from fMRI data, and use the result to identify functional regions automatically. The results show good agreement with a clustering from the neuroscience literature.} }
Endnote
%0 Conference Paper %T Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation %A Penporn Koanantakool %A Alnur Ali %A Ariful Azad %A Aydin Buluc %A Dmitriy Morozov %A Leonid Oliker %A Katherine Yelick %A Sang-Yun Oh %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-koanantakool18a %I PMLR %P 1376--1386 %U http://proceedings.mlr.press/v84/koanantakool18a.html %V 84 %X Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal gradient method uses a novel communication-avoiding linear algebra algorithm and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving parallel scalability on problems with up to ≈819 billion parameters (1.28 million dimensions); even on a single node, HP-CONCORD demonstrates scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to estimate the underlying dependency structure of the brain from fMRI data, and use the result to identify functional regions automatically. The results show good agreement with a clustering from the neuroscience literature.
APA
Koanantakool, P., Ali, A., Azad, A., Buluc, A., Morozov, D., Oliker, L., Yelick, K. & Oh, S.. (2018). Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1376-1386 Available from http://proceedings.mlr.press/v84/koanantakool18a.html .

Related Material