Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types

Benjamin Christoffersen, Mark Clements, Keith Humphreys, Hedvig Kjellström
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:870-885, 2021.

Abstract

Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-christoffersen21a, title = {Asymptotically Exact and Fast {G}aussian Copula Models for Imputation of Mixed Data Types}, author = {Christoffersen, Benjamin and Clements, Mark and Humphreys, Keith and Kjellstr{\"o}m, Hedvig}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {870--885}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/christoffersen21a/christoffersen21a.pdf}, url = {https://proceedings.mlr.press/v157/christoffersen21a.html}, abstract = {Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables.} }
Endnote
%0 Conference Paper %T Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types %A Benjamin Christoffersen %A Mark Clements %A Keith Humphreys %A Hedvig Kjellström %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-christoffersen21a %I PMLR %P 870--885 %U https://proceedings.mlr.press/v157/christoffersen21a.html %V 157 %X Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables.
APA
Christoffersen, B., Clements, M., Humphreys, K. & Kjellström, H.. (2021). Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:870-885 Available from https://proceedings.mlr.press/v157/christoffersen21a.html.

Related Material