Meta-Analysis with Untrusted Data

Shiva Kaul; Geoffrey Gordon

Meta-Analysis with Untrusted Data

Shiva Kaul, Geoffrey Gordon

Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:563-593, 2025.

Abstract

Meta-analyses are usually conducted on small amounts of “trusted” data, ideally from randomized, controlled trials. Excluding untrusted (observational) data — such as medical records and related scientific literature — avoids potential confounding and ensures unbiased conclusions. Unfortunately, this exclusion can reduce predictive accuracy to the point of clinical irrelevance, especially when trials are heterogeneous. This paper shows how untrusted data can be safely incorporated into meta-analysis, improving predictions without sacrificing rigor or introducing unproven assumptions. Our approach, called conformal meta-analysis, consists of (1) learning a (potentially flawed) prior distribution from the untrusted data, (2) using the prior and trusted data to derive a simple, fully-conformal prediction interval for the observed trial effect, and (3) analytically extracting an interval for the true (unobserved) effect. In multiple experiments on healthcare datasets, our algorithms deliver tighter, sounder intervals than traditional ones. This paper conceptually realigns meta-analysis as a foundation for evidence-based medicine, embracing heterogeneity and untrusted data for more nuanced, precise predictions.

Cite this Paper

BibTeX

@InProceedings{pmlr-v259-kaul25a,
  title = 	 {Meta-Analysis with Untrusted Data},
  author =       {Kaul, Shiva and Gordon, Geoffrey},
  booktitle = 	 {Proceedings of the 4th Machine Learning for Health Symposium},
  pages = 	 {563--593},
  year = 	 {2025},
  editor = 	 {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran},
  volume = 	 {259},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--16 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v259/main/assets/kaul25a/kaul25a.pdf},
  url = 	 {https://proceedings.mlr.press/v259/kaul25a.html},
  abstract = 	 {Meta-analyses are usually conducted on small amounts of “trusted” data, ideally from randomized, controlled trials. Excluding untrusted (observational) data — such as medical records and related scientific literature — avoids potential confounding and ensures unbiased conclusions. Unfortunately, this exclusion can reduce predictive accuracy to the point of clinical irrelevance, especially when trials are heterogeneous. This paper shows how untrusted data can be safely incorporated into meta-analysis, improving predictions without sacrificing rigor or introducing unproven assumptions. Our approach, called conformal meta-analysis, consists of (1) learning a (potentially flawed) prior distribution from the untrusted data, (2) using the prior and trusted data to derive a simple, fully-conformal prediction interval for the observed trial effect, and (3) analytically extracting an interval for the true (unobserved) effect. In multiple experiments on healthcare datasets, our algorithms deliver tighter, sounder intervals than traditional ones. This paper conceptually realigns meta-analysis as a foundation for evidence-based medicine, embracing heterogeneity and untrusted data for more nuanced, precise predictions.}
}

Endnote

%0 Conference Paper
%T Meta-Analysis with Untrusted Data
%A Shiva Kaul
%A Geoffrey Gordon
%B Proceedings of the 4th Machine Learning for Health Symposium
%C Proceedings of Machine Learning Research
%D 2025
%E Stefan Hegselmann
%E Helen Zhou
%E Elizabeth Healey
%E Trenton Chang
%E Caleb Ellington
%E Vishwali Mhasawade
%E Sana Tonekaboni
%E Peniel Argaw
%E Haoran Zhang	
%F pmlr-v259-kaul25a
%I PMLR
%P 563--593
%U https://proceedings.mlr.press/v259/kaul25a.html
%V 259
%X Meta-analyses are usually conducted on small amounts of “trusted” data, ideally from randomized, controlled trials. Excluding untrusted (observational) data — such as medical records and related scientific literature — avoids potential confounding and ensures unbiased conclusions. Unfortunately, this exclusion can reduce predictive accuracy to the point of clinical irrelevance, especially when trials are heterogeneous. This paper shows how untrusted data can be safely incorporated into meta-analysis, improving predictions without sacrificing rigor or introducing unproven assumptions. Our approach, called conformal meta-analysis, consists of (1) learning a (potentially flawed) prior distribution from the untrusted data, (2) using the prior and trusted data to derive a simple, fully-conformal prediction interval for the observed trial effect, and (3) analytically extracting an interval for the true (unobserved) effect. In multiple experiments on healthcare datasets, our algorithms deliver tighter, sounder intervals than traditional ones. This paper conceptually realigns meta-analysis as a foundation for evidence-based medicine, embracing heterogeneity and untrusted data for more nuanced, precise predictions.

APA

Kaul, S. & Gordon, G.. (2025). Meta-Analysis with Untrusted Data. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:563-593 Available from https://proceedings.mlr.press/v259/kaul25a.html.

Meta-Analysis with Untrusted Data

Abstract

Cite this Paper

Related Material