Efficient Statistics for Sparse Graphical Models from Truncated Samples

Arnab Bhattacharyya, Rathin Desai, Sai Ganesh Nagarajan, Ioannis Panageas
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1450-1458, 2021.

Abstract

In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose d-dimensional samples x are generated from a Gaussian N(mu, Sigma) and observed only if they belong to a subset S of R^d. We show that mu and Sigma can be estimated with error epsilon in the Frobenius norm, using O (nz(Sigma^{-1})/epsilon^2) samples from a truncated N(mu, Sigma) and having access to a membership oracle for S. The set S is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples (x,y) are generated where y = + N(0,1) and (x, y) is seen only if y belongs to a truncation set S of the reals. We consider the case that Omega* is sparse with a support set of size k. Our main result is to establish precise conditions on the problem dimension d, the support size k, the number of observations n, and properties of the samples and the truncation that are sufficient to recover the support of Omega*. Specifically, we show that under some mild assumptions, only O(k^2 log d) samples are needed to estimate Omega* in the infinity-norm up to a bounded error. Similar results are also estabilished for estimating Omega* in the Euclidean norm up to arbitrary error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an ell_1-regularization term.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-bhattacharyya21a, title = { Efficient Statistics for Sparse Graphical Models from Truncated Samples }, author = {Bhattacharyya, Arnab and Desai, Rathin and Ganesh Nagarajan, Sai and Panageas, Ioannis}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1450--1458}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/bhattacharyya21a/bhattacharyya21a.pdf}, url = {https://proceedings.mlr.press/v130/bhattacharyya21a.html}, abstract = { In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose d-dimensional samples x are generated from a Gaussian N(mu, Sigma) and observed only if they belong to a subset S of R^d. We show that mu and Sigma can be estimated with error epsilon in the Frobenius norm, using O (nz(Sigma^{-1})/epsilon^2) samples from a truncated N(mu, Sigma) and having access to a membership oracle for S. The set S is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples (x,y) are generated where y = + N(0,1) and (x, y) is seen only if y belongs to a truncation set S of the reals. We consider the case that Omega* is sparse with a support set of size k. Our main result is to establish precise conditions on the problem dimension d, the support size k, the number of observations n, and properties of the samples and the truncation that are sufficient to recover the support of Omega*. Specifically, we show that under some mild assumptions, only O(k^2 log d) samples are needed to estimate Omega* in the infinity-norm up to a bounded error. Similar results are also estabilished for estimating Omega* in the Euclidean norm up to arbitrary error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an ell_1-regularization term. } }
Endnote
%0 Conference Paper %T Efficient Statistics for Sparse Graphical Models from Truncated Samples %A Arnab Bhattacharyya %A Rathin Desai %A Sai Ganesh Nagarajan %A Ioannis Panageas %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-bhattacharyya21a %I PMLR %P 1450--1458 %U https://proceedings.mlr.press/v130/bhattacharyya21a.html %V 130 %X In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose d-dimensional samples x are generated from a Gaussian N(mu, Sigma) and observed only if they belong to a subset S of R^d. We show that mu and Sigma can be estimated with error epsilon in the Frobenius norm, using O (nz(Sigma^{-1})/epsilon^2) samples from a truncated N(mu, Sigma) and having access to a membership oracle for S. The set S is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples (x,y) are generated where y = + N(0,1) and (x, y) is seen only if y belongs to a truncation set S of the reals. We consider the case that Omega* is sparse with a support set of size k. Our main result is to establish precise conditions on the problem dimension d, the support size k, the number of observations n, and properties of the samples and the truncation that are sufficient to recover the support of Omega*. Specifically, we show that under some mild assumptions, only O(k^2 log d) samples are needed to estimate Omega* in the infinity-norm up to a bounded error. Similar results are also estabilished for estimating Omega* in the Euclidean norm up to arbitrary error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an ell_1-regularization term.
APA
Bhattacharyya, A., Desai, R., Ganesh Nagarajan, S. & Panageas, I.. (2021). Efficient Statistics for Sparse Graphical Models from Truncated Samples . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1450-1458 Available from https://proceedings.mlr.press/v130/bhattacharyya21a.html.

Related Material