MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets

Pierre-Alexandre Mattei, Jes Frellsen
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4413-4423, 2019.

Abstract

We consider the problem of handling missing data with deep latent variable models (DLVMs). First, we present a simple technique to train DLVMs when the training set contains missing-at-random data. Our approach, called MIWAE, is based on the importance-weighted autoencoder (IWAE), and maximises a potentially tight lower bound of the log-likelihood of the observed data. Compared to the original IWAE, our algorithm does not induce any additional computational overhead due to the missing data. We also develop Monte Carlo techniques for single and multiple imputation using a DLVM trained on an incomplete data set. We illustrate our approach by training a convolutional DLVM on incomplete static binarisations of MNIST. Moreover, on various continuous data sets, we show that MIWAE provides extremely accurate single imputations, and is highly competitive with state-of-the-art methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-mattei19a, title = {{MIWAE}: Deep Generative Modelling and Imputation of Incomplete Data Sets}, author = {Mattei, Pierre-Alexandre and Frellsen, Jes}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4413--4423}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/mattei19a/mattei19a.pdf}, url = {https://proceedings.mlr.press/v97/mattei19a.html}, abstract = {We consider the problem of handling missing data with deep latent variable models (DLVMs). First, we present a simple technique to train DLVMs when the training set contains missing-at-random data. Our approach, called MIWAE, is based on the importance-weighted autoencoder (IWAE), and maximises a potentially tight lower bound of the log-likelihood of the observed data. Compared to the original IWAE, our algorithm does not induce any additional computational overhead due to the missing data. We also develop Monte Carlo techniques for single and multiple imputation using a DLVM trained on an incomplete data set. We illustrate our approach by training a convolutional DLVM on incomplete static binarisations of MNIST. Moreover, on various continuous data sets, we show that MIWAE provides extremely accurate single imputations, and is highly competitive with state-of-the-art methods.} }
Endnote
%0 Conference Paper %T MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets %A Pierre-Alexandre Mattei %A Jes Frellsen %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-mattei19a %I PMLR %P 4413--4423 %U https://proceedings.mlr.press/v97/mattei19a.html %V 97 %X We consider the problem of handling missing data with deep latent variable models (DLVMs). First, we present a simple technique to train DLVMs when the training set contains missing-at-random data. Our approach, called MIWAE, is based on the importance-weighted autoencoder (IWAE), and maximises a potentially tight lower bound of the log-likelihood of the observed data. Compared to the original IWAE, our algorithm does not induce any additional computational overhead due to the missing data. We also develop Monte Carlo techniques for single and multiple imputation using a DLVM trained on an incomplete data set. We illustrate our approach by training a convolutional DLVM on incomplete static binarisations of MNIST. Moreover, on various continuous data sets, we show that MIWAE provides extremely accurate single imputations, and is highly competitive with state-of-the-art methods.
APA
Mattei, P. & Frellsen, J.. (2019). MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4413-4423 Available from https://proceedings.mlr.press/v97/mattei19a.html.

Related Material