Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach

Zixiao Wang, Amiremad Ghassami, Ilya Shpitser
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:50467-50488, 2024.

Abstract

We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR). In general, such parameters are not identified without strong assumptions on the missing data model. In this paper, we take an alternative approach and introduce a method inspired by data fusion, where information in the MNAR dataset is augmented by information in an auxiliary dataset subject to missingness at random (MAR). We show that even if the parameter of interest cannot be identified given either dataset alone, it can be identified given pooled data, under two complementary sets of assumptions. We derive inverse probability weighted (IPW) estimators for identified parameters under both sets of assumptions, and evaluate the performance of our estimation strategies via simulation studies, and a data application.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24t, title = {Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach}, author = {Wang, Zixiao and Ghassami, Amiremad and Shpitser, Ilya}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {50467--50488}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24t/wang24t.pdf}, url = {https://proceedings.mlr.press/v235/wang24t.html}, abstract = {We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR). In general, such parameters are not identified without strong assumptions on the missing data model. In this paper, we take an alternative approach and introduce a method inspired by data fusion, where information in the MNAR dataset is augmented by information in an auxiliary dataset subject to missingness at random (MAR). We show that even if the parameter of interest cannot be identified given either dataset alone, it can be identified given pooled data, under two complementary sets of assumptions. We derive inverse probability weighted (IPW) estimators for identified parameters under both sets of assumptions, and evaluate the performance of our estimation strategies via simulation studies, and a data application.} }
Endnote
%0 Conference Paper %T Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach %A Zixiao Wang %A Amiremad Ghassami %A Ilya Shpitser %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24t %I PMLR %P 50467--50488 %U https://proceedings.mlr.press/v235/wang24t.html %V 235 %X We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR). In general, such parameters are not identified without strong assumptions on the missing data model. In this paper, we take an alternative approach and introduce a method inspired by data fusion, where information in the MNAR dataset is augmented by information in an auxiliary dataset subject to missingness at random (MAR). We show that even if the parameter of interest cannot be identified given either dataset alone, it can be identified given pooled data, under two complementary sets of assumptions. We derive inverse probability weighted (IPW) estimators for identified parameters under both sets of assumptions, and evaluate the performance of our estimation strategies via simulation studies, and a data application.
APA
Wang, Z., Ghassami, A. & Shpitser, I.. (2024). Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:50467-50488 Available from https://proceedings.mlr.press/v235/wang24t.html.

Related Material