Transformed Distribution Matching for Missing Value Imputation

He Zhao, Ke Sun, Amir Dezfouli, Edwin V. Bonilla
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:42159-42186, 2023.

Abstract

We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-zhao23h, title = {Transformed Distribution Matching for Missing Value Imputation}, author = {Zhao, He and Sun, Ke and Dezfouli, Amir and Bonilla, Edwin V.}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {42159--42186}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/zhao23h/zhao23h.pdf}, url = {https://proceedings.mlr.press/v202/zhao23h.html}, abstract = {We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.} }
Endnote
%0 Conference Paper %T Transformed Distribution Matching for Missing Value Imputation %A He Zhao %A Ke Sun %A Amir Dezfouli %A Edwin V. Bonilla %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-zhao23h %I PMLR %P 42159--42186 %U https://proceedings.mlr.press/v202/zhao23h.html %V 202 %X We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.
APA
Zhao, H., Sun, K., Dezfouli, A. & Bonilla, E.V.. (2023). Transformed Distribution Matching for Missing Value Imputation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:42159-42186 Available from https://proceedings.mlr.press/v202/zhao23h.html.

Related Material