Cross-validating causal discovery via Leave-One-Variable-Out

Daniela Schkoda, Philipp Michael Faller, Dominik Janzing, Patrick Blöbaum
Proceedings of the Fourth Conference on Causal Learning and Reasoning, PMLR 275:659-692, 2025.

Abstract

We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a variable pair excluded during learning the causal model. Specifically, given data on $X, Y, \boldsymbol{Z}=X, Y, Z_1,…,Z_k$, we apply the causal discovery algorithm separately to the ’leave-one-out’ data sets $X, \boldsymbol{Z}$ and $Y, \boldsymbol{Z}$. We demonstrate that the two resulting causal models, in the form DAGs, ADMGs, CPDAGs or PAGs, often entail conclusions on the dependencies between $X$ and $Y$ and allow to estimate $\mathbb{E}(Y\mid X=x)$ without any joint observations of $X$ and $Y$, given only the leave-one-out datasets. This estimation is called "Leave-One-Variable-Out (LOVO)" prediction. Its error can be estimated since the joint distribution $P(X, Y)$ is available, and $X$ and $Y$ have only been omitted for the purpose of falsification. We present two variants of LOVO prediction: One graphical method, which is applicable to general causal discovery algorithms, and one version tailored towards algorithms relying on specific a priori assumptions, such as linear additive noise models. Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method’s effectiveness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v275-schkoda25a, title = {Cross-validating causal discovery via Leave-One-Variable-Out}, author = {Schkoda, Daniela and Faller, Philipp Michael and Janzing, Dominik and Bl\"{o}baum, Patrick}, booktitle = {Proceedings of the Fourth Conference on Causal Learning and Reasoning}, pages = {659--692}, year = {2025}, editor = {Huang, Biwei and Drton, Mathias}, volume = {275}, series = {Proceedings of Machine Learning Research}, month = {07--09 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v275/main/assets/schkoda25a/schkoda25a.pdf}, url = {https://proceedings.mlr.press/v275/schkoda25a.html}, abstract = {We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a variable pair excluded during learning the causal model. Specifically, given data on $X, Y, \boldsymbol{Z}=X, Y, Z_1,…,Z_k$, we apply the causal discovery algorithm separately to the ’leave-one-out’ data sets $X, \boldsymbol{Z}$ and $Y, \boldsymbol{Z}$. We demonstrate that the two resulting causal models, in the form DAGs, ADMGs, CPDAGs or PAGs, often entail conclusions on the dependencies between $X$ and $Y$ and allow to estimate $\mathbb{E}(Y\mid X=x)$ without any joint observations of $X$ and $Y$, given only the leave-one-out datasets. This estimation is called "Leave-One-Variable-Out (LOVO)" prediction. Its error can be estimated since the joint distribution $P(X, Y)$ is available, and $X$ and $Y$ have only been omitted for the purpose of falsification. We present two variants of LOVO prediction: One graphical method, which is applicable to general causal discovery algorithms, and one version tailored towards algorithms relying on specific a priori assumptions, such as linear additive noise models. Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method’s effectiveness.} }
Endnote
%0 Conference Paper %T Cross-validating causal discovery via Leave-One-Variable-Out %A Daniela Schkoda %A Philipp Michael Faller %A Dominik Janzing %A Patrick Blöbaum %B Proceedings of the Fourth Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2025 %E Biwei Huang %E Mathias Drton %F pmlr-v275-schkoda25a %I PMLR %P 659--692 %U https://proceedings.mlr.press/v275/schkoda25a.html %V 275 %X We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a variable pair excluded during learning the causal model. Specifically, given data on $X, Y, \boldsymbol{Z}=X, Y, Z_1,…,Z_k$, we apply the causal discovery algorithm separately to the ’leave-one-out’ data sets $X, \boldsymbol{Z}$ and $Y, \boldsymbol{Z}$. We demonstrate that the two resulting causal models, in the form DAGs, ADMGs, CPDAGs or PAGs, often entail conclusions on the dependencies between $X$ and $Y$ and allow to estimate $\mathbb{E}(Y\mid X=x)$ without any joint observations of $X$ and $Y$, given only the leave-one-out datasets. This estimation is called "Leave-One-Variable-Out (LOVO)" prediction. Its error can be estimated since the joint distribution $P(X, Y)$ is available, and $X$ and $Y$ have only been omitted for the purpose of falsification. We present two variants of LOVO prediction: One graphical method, which is applicable to general causal discovery algorithms, and one version tailored towards algorithms relying on specific a priori assumptions, such as linear additive noise models. Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method’s effectiveness.
APA
Schkoda, D., Faller, P.M., Janzing, D. & Blöbaum, P.. (2025). Cross-validating causal discovery via Leave-One-Variable-Out. Proceedings of the Fourth Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 275:659-692 Available from https://proceedings.mlr.press/v275/schkoda25a.html.

Related Material