Gaussian Copula Precision Estimation with Missing Values

[edit]

Huahua Wang, Farideh Fazayeli, Soumyadeep Chatterjee, Arindam Banerjee ;
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:978-986, 2014.

Abstract

We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to \emphnon-paranormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall’s tau and Spearman’s rho; (2) estimate the non-paranormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the non-paranormal correlation matrix at a rate of O(\frac1(1-δ)\sqrt\frac\log pn), where δis the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data.

Related Material