Testing Generalizability in Causal Inference

Daniel de Vassimon Manela; Linying Yang; Robin J. Evans

Testing Generalizability in Causal Inference

Daniel de Vassimon Manela, Linying Yang, Robin J. Evans

Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:2906-2927, 2025.

Abstract

Ensuring robust model performance in diverse real-world scenarios requires addressing generalizability across domains with covariate shifts. However, no formal procedure exists for statistically evaluating generalizability in machine learning algorithms. Existing methods often rely on arbitrary proxy predictive metrics like mean squared error, but do not directly answer whether a model can or cannot generalize. To address this gap in the domain of causal inference, we propose a systematic framework for statistically evaluating the generalizability of high-dimensional causal inference models. Our approach uses the frugal parameterization to flexibly simulate from fully and semi-synthetic causal benchmarks, offering a comprehensive evaluation for both mean and distributional regression methods. Grounded in real-world data, our method ensures more realistic evaluations, which is often missing in current work relying on simplified datasets. Furthermore, using simulations and statistical testing, our framework is robust and avoids over-reliance on conventional metrics, providing statistical safeguards for decision making.

Cite this Paper

BibTeX

@InProceedings{pmlr-v286-vassimon-manela25a,
  title = 	 {Testing Generalizability in Causal Inference},
  author =       {de Vassimon Manela, Daniel and Yang, Linying and Evans, Robin J.},
  booktitle = 	 {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {2906--2927},
  year = 	 {2025},
  editor = 	 {Chiappa, Silvia and Magliacane, Sara},
  volume = 	 {286},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--25 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v286/main/assets/vassimon-manela25a/vassimon-manela25a.pdf},
  url = 	 {https://proceedings.mlr.press/v286/vassimon-manela25a.html},
  abstract = 	 {Ensuring robust model performance in diverse real-world scenarios requires addressing generalizability across domains with covariate shifts. However, no formal procedure exists for statistically evaluating generalizability in machine learning algorithms. Existing methods often rely on arbitrary proxy predictive metrics like mean squared error, but do not directly answer whether a model can or cannot generalize. To address this gap in the domain of causal inference, we propose a systematic framework for statistically evaluating the generalizability of high-dimensional causal inference models. Our approach uses the frugal parameterization to flexibly simulate from fully and semi-synthetic causal benchmarks, offering a comprehensive evaluation for both mean and distributional regression methods. Grounded in real-world data, our method ensures more realistic evaluations, which is often missing in current work relying on simplified datasets. Furthermore, using simulations and statistical testing, our framework is robust and avoids over-reliance on conventional metrics, providing statistical safeguards for decision making.}
}

Endnote

%0 Conference Paper
%T Testing Generalizability in Causal Inference
%A Daniel de Vassimon Manela
%A Linying Yang
%A Robin J. Evans
%B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2025
%E Silvia Chiappa
%E Sara Magliacane	
%F pmlr-v286-vassimon-manela25a
%I PMLR
%P 2906--2927
%U https://proceedings.mlr.press/v286/vassimon-manela25a.html
%V 286
%X Ensuring robust model performance in diverse real-world scenarios requires addressing generalizability across domains with covariate shifts. However, no formal procedure exists for statistically evaluating generalizability in machine learning algorithms. Existing methods often rely on arbitrary proxy predictive metrics like mean squared error, but do not directly answer whether a model can or cannot generalize. To address this gap in the domain of causal inference, we propose a systematic framework for statistically evaluating the generalizability of high-dimensional causal inference models. Our approach uses the frugal parameterization to flexibly simulate from fully and semi-synthetic causal benchmarks, offering a comprehensive evaluation for both mean and distributional regression methods. Grounded in real-world data, our method ensures more realistic evaluations, which is often missing in current work relying on simplified datasets. Furthermore, using simulations and statistical testing, our framework is robust and avoids over-reliance on conventional metrics, providing statistical safeguards for decision making.

APA

de Vassimon Manela, D., Yang, L. & Evans, R.J.. (2025). Testing Generalizability in Causal Inference. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:2906-2927 Available from https://proceedings.mlr.press/v286/vassimon-manela25a.html.

Testing Generalizability in Causal Inference

Abstract

Cite this Paper

Related Material