Gaussian Copula Precision Estimation with Missing Values

Huahua Wang, Farideh Fazayeli, Soumyadeep Chatterjee, Arindam Banerjee
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:978-986, 2014.

Abstract

We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to \emphnon-paranormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall’s tau and Spearman’s rho; (2) estimate the non-paranormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the non-paranormal correlation matrix at a rate of O(\frac1(1-δ)\sqrt\frac\log pn), where δis the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v33-wang14a, title = {{Gaussian Copula Precision Estimation with Missing Values}}, author = {Wang, Huahua and Fazayeli, Farideh and Chatterjee, Soumyadeep and Banerjee, Arindam}, booktitle = {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics}, pages = {978--986}, year = {2014}, editor = {Kaski, Samuel and Corander, Jukka}, volume = {33}, series = {Proceedings of Machine Learning Research}, address = {Reykjavik, Iceland}, month = {22--25 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v33/wang14a.pdf}, url = {https://proceedings.mlr.press/v33/wang14a.html}, abstract = {We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to \emphnon-paranormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall’s tau and Spearman’s rho; (2) estimate the non-paranormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the non-paranormal correlation matrix at a rate of O(\frac1(1-δ)\sqrt\frac\log pn), where δis the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data.} }
Endnote
%0 Conference Paper %T Gaussian Copula Precision Estimation with Missing Values %A Huahua Wang %A Farideh Fazayeli %A Soumyadeep Chatterjee %A Arindam Banerjee %B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2014 %E Samuel Kaski %E Jukka Corander %F pmlr-v33-wang14a %I PMLR %P 978--986 %U https://proceedings.mlr.press/v33/wang14a.html %V 33 %X We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to \emphnon-paranormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall’s tau and Spearman’s rho; (2) estimate the non-paranormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the non-paranormal correlation matrix at a rate of O(\frac1(1-δ)\sqrt\frac\log pn), where δis the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data.
RIS
TY - CPAPER TI - Gaussian Copula Precision Estimation with Missing Values AU - Huahua Wang AU - Farideh Fazayeli AU - Soumyadeep Chatterjee AU - Arindam Banerjee BT - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics DA - 2014/04/02 ED - Samuel Kaski ED - Jukka Corander ID - pmlr-v33-wang14a PB - PMLR DP - Proceedings of Machine Learning Research VL - 33 SP - 978 EP - 986 L1 - http://proceedings.mlr.press/v33/wang14a.pdf UR - https://proceedings.mlr.press/v33/wang14a.html AB - We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to \emphnon-paranormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall’s tau and Spearman’s rho; (2) estimate the non-paranormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the non-paranormal correlation matrix at a rate of O(\frac1(1-δ)\sqrt\frac\log pn), where δis the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data. ER -
APA
Wang, H., Fazayeli, F., Chatterjee, S. & Banerjee, A.. (2014). Gaussian Copula Precision Estimation with Missing Values. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:978-986 Available from https://proceedings.mlr.press/v33/wang14a.html.

Related Material