Structure Learning Under Missing Data

Alexander Gain, Ilya Shpitser
Proceedings of the Ninth International Conference on Probabilistic Graphical Models, PMLR 72:121-132, 2018.

Abstract

Causal discovery is the problem of learning the structure of a graphical causal model that approximates the true generating process that gave rise to observed data. In practical problems, including in causal discovery problems, missing data is a very common issue. In such cases, learning the true causal graph entails estimating the full data distribution, samples from which are not directly available. Attempting to instead apply existing structure learning algorithms to samples drawn from the observed data distribution, containing systematically missing entries, may well result in incorrect inferences due to selection bias. Inthis paperwe discuss adjustmentsthat mustbemade toexistingstructure learningalgorithms to properly account for missing data. We first give an algorithm for the simpler setting where the underlying graph is unknown, but the missing data model is known. We then discuss approaches to the much more difficult case where only the observed data is given with no other additional information on the missingness model known. We validate our approach by simulations, showing that it outperforms standard structure learning algorithms in all of these settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v72-gain18a, title = {Structure Learning Under Missing Data}, author = {Gain, Alexander and Shpitser, Ilya}, booktitle = {Proceedings of the Ninth International Conference on Probabilistic Graphical Models}, pages = {121--132}, year = {2018}, editor = {Kratochvíl, Václav and Studený, Milan}, volume = {72}, series = {Proceedings of Machine Learning Research}, month = {11--14 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v72/gain18a/gain18a.pdf}, url = {https://proceedings.mlr.press/v72/gain18a.html}, abstract = {Causal discovery is the problem of learning the structure of a graphical causal model that approximates the true generating process that gave rise to observed data. In practical problems, including in causal discovery problems, missing data is a very common issue. In such cases, learning the true causal graph entails estimating the full data distribution, samples from which are not directly available. Attempting to instead apply existing structure learning algorithms to samples drawn from the observed data distribution, containing systematically missing entries, may well result in incorrect inferences due to selection bias. Inthis paperwe discuss adjustmentsthat mustbemade toexistingstructure learningalgorithms to properly account for missing data. We first give an algorithm for the simpler setting where the underlying graph is unknown, but the missing data model is known. We then discuss approaches to the much more difficult case where only the observed data is given with no other additional information on the missingness model known. We validate our approach by simulations, showing that it outperforms standard structure learning algorithms in all of these settings.} }
Endnote
%0 Conference Paper %T Structure Learning Under Missing Data %A Alexander Gain %A Ilya Shpitser %B Proceedings of the Ninth International Conference on Probabilistic Graphical Models %C Proceedings of Machine Learning Research %D 2018 %E Václav Kratochvíl %E Milan Studený %F pmlr-v72-gain18a %I PMLR %P 121--132 %U https://proceedings.mlr.press/v72/gain18a.html %V 72 %X Causal discovery is the problem of learning the structure of a graphical causal model that approximates the true generating process that gave rise to observed data. In practical problems, including in causal discovery problems, missing data is a very common issue. In such cases, learning the true causal graph entails estimating the full data distribution, samples from which are not directly available. Attempting to instead apply existing structure learning algorithms to samples drawn from the observed data distribution, containing systematically missing entries, may well result in incorrect inferences due to selection bias. Inthis paperwe discuss adjustmentsthat mustbemade toexistingstructure learningalgorithms to properly account for missing data. We first give an algorithm for the simpler setting where the underlying graph is unknown, but the missing data model is known. We then discuss approaches to the much more difficult case where only the observed data is given with no other additional information on the missingness model known. We validate our approach by simulations, showing that it outperforms standard structure learning algorithms in all of these settings.
APA
Gain, A. & Shpitser, I.. (2018). Structure Learning Under Missing Data. Proceedings of the Ninth International Conference on Probabilistic Graphical Models, in Proceedings of Machine Learning Research 72:121-132 Available from https://proceedings.mlr.press/v72/gain18a.html.

Related Material