Detecting critical treatment effect bias in small subgroups

Piersilvio De Bartolomeis, Javier Abad, Konstantin Donhauser, Fanny Yang
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:943-965, 2024.

Abstract

Randomized trials are considered the gold standard for making informed decisions in medicine. However, they are often not representative of the patient population in clinical practice. Observational studies, on the other hand, cover a broader patient population but are prone to various biases. Thus, before using observational data for any downstream task, it is crucial to benchmark its treatment effect estimates against a randomized trial. We propose a novel strategy to benchmark observational studies on a subgroup level. First, we design a statistical test for the null hypothesis that the treatment effects – conditioned on a subset of relevant features – differ up to some tolerance value. Our test allows us to estimate an asymptotically valid lower bound on the maximum bias strength for any subgroup. We validate our lower bound in a real-world setting and show that it leads to conclusions that align with established medical knowledge.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-de-bartolomeis24a, title = {Detecting critical treatment effect bias in small subgroups}, author = {De Bartolomeis, Piersilvio and Abad, Javier and Donhauser, Konstantin and Yang, Fanny}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {943--965}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/de-bartolomeis24a/de-bartolomeis24a.pdf}, url = {https://proceedings.mlr.press/v244/de-bartolomeis24a.html}, abstract = {Randomized trials are considered the gold standard for making informed decisions in medicine. However, they are often not representative of the patient population in clinical practice. Observational studies, on the other hand, cover a broader patient population but are prone to various biases. Thus, before using observational data for any downstream task, it is crucial to benchmark its treatment effect estimates against a randomized trial. We propose a novel strategy to benchmark observational studies on a subgroup level. First, we design a statistical test for the null hypothesis that the treatment effects – conditioned on a subset of relevant features – differ up to some tolerance value. Our test allows us to estimate an asymptotically valid lower bound on the maximum bias strength for any subgroup. We validate our lower bound in a real-world setting and show that it leads to conclusions that align with established medical knowledge.} }
Endnote
%0 Conference Paper %T Detecting critical treatment effect bias in small subgroups %A Piersilvio De Bartolomeis %A Javier Abad %A Konstantin Donhauser %A Fanny Yang %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-de-bartolomeis24a %I PMLR %P 943--965 %U https://proceedings.mlr.press/v244/de-bartolomeis24a.html %V 244 %X Randomized trials are considered the gold standard for making informed decisions in medicine. However, they are often not representative of the patient population in clinical practice. Observational studies, on the other hand, cover a broader patient population but are prone to various biases. Thus, before using observational data for any downstream task, it is crucial to benchmark its treatment effect estimates against a randomized trial. We propose a novel strategy to benchmark observational studies on a subgroup level. First, we design a statistical test for the null hypothesis that the treatment effects – conditioned on a subset of relevant features – differ up to some tolerance value. Our test allows us to estimate an asymptotically valid lower bound on the maximum bias strength for any subgroup. We validate our lower bound in a real-world setting and show that it leads to conclusions that align with established medical knowledge.
APA
De Bartolomeis, P., Abad, J., Donhauser, K. & Yang, F.. (2024). Detecting critical treatment effect bias in small subgroups. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:943-965 Available from https://proceedings.mlr.press/v244/de-bartolomeis24a.html.

Related Material