GAIN: Missing Data Imputation using Generative Adversarial Nets

Jinsung Yoon, James Jordon, Mihaela Schaar
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5689-5698, 2018.

Abstract

We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-yoon18a, title = {{GAIN}: Missing Data Imputation using Generative Adversarial Nets}, author = {Yoon, Jinsung and Jordon, James and van der Schaar, Mihaela}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {5689--5698}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/yoon18a/yoon18a.pdf}, url = {https://proceedings.mlr.press/v80/yoon18a.html}, abstract = {We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.} }
Endnote
%0 Conference Paper %T GAIN: Missing Data Imputation using Generative Adversarial Nets %A Jinsung Yoon %A James Jordon %A Mihaela Schaar %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-yoon18a %I PMLR %P 5689--5698 %U https://proceedings.mlr.press/v80/yoon18a.html %V 80 %X We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.
APA
Yoon, J., Jordon, J. & Schaar, M.. (2018). GAIN: Missing Data Imputation using Generative Adversarial Nets. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5689-5698 Available from https://proceedings.mlr.press/v80/yoon18a.html.

Related Material