Just Trial Once: Ongoing Causal Validation of Machine Learning Models

Jacob M. Chen, Michael Oberst
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:586-611, 2025.

Abstract

Machine learning (ML) models are increasingly used as decision-support tools in high-risk domains. Evaluating the causal impact of deploying such models can be done with a randomized controlled trial (RCT) that randomizes users to ML vs. control groups and assesses the effect on relevant outcomes. However, ML models are inevitably updated over time, and we often lack evidence for the causal impact of these updates. While the causal effect could be repeatedly validated with ongoing RCTs, such experiments are expensive and time-consuming to run. In this work, we present an alternative solution: using only data from a prior RCT, we give conditions under which the causal impact of a new ML model can be precisely bounded or estimated, even if it was not included in the RCT. Our assumptions incorporate two realistic constraints: ML predictions are often deterministic, and their impacts depend on user trust in the model. Based on our analysis, we give recommendations for trial designs that maximize our ability to assess future versions of an ML model. Our hope is that our trial design recommendations will save practitioners time and resources while allowing for quicker deployments of updates to ML models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-chen25b, title = {Just Trial Once: Ongoing Causal Validation of Machine Learning Models}, author = {Chen, Jacob M. and Oberst, Michael}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {586--611}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/chen25b/chen25b.pdf}, url = {https://proceedings.mlr.press/v286/chen25b.html}, abstract = {Machine learning (ML) models are increasingly used as decision-support tools in high-risk domains. Evaluating the causal impact of deploying such models can be done with a randomized controlled trial (RCT) that randomizes users to ML vs. control groups and assesses the effect on relevant outcomes. However, ML models are inevitably updated over time, and we often lack evidence for the causal impact of these updates. While the causal effect could be repeatedly validated with ongoing RCTs, such experiments are expensive and time-consuming to run. In this work, we present an alternative solution: using only data from a prior RCT, we give conditions under which the causal impact of a new ML model can be precisely bounded or estimated, even if it was not included in the RCT. Our assumptions incorporate two realistic constraints: ML predictions are often deterministic, and their impacts depend on user trust in the model. Based on our analysis, we give recommendations for trial designs that maximize our ability to assess future versions of an ML model. Our hope is that our trial design recommendations will save practitioners time and resources while allowing for quicker deployments of updates to ML models.} }
Endnote
%0 Conference Paper %T Just Trial Once: Ongoing Causal Validation of Machine Learning Models %A Jacob M. Chen %A Michael Oberst %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-chen25b %I PMLR %P 586--611 %U https://proceedings.mlr.press/v286/chen25b.html %V 286 %X Machine learning (ML) models are increasingly used as decision-support tools in high-risk domains. Evaluating the causal impact of deploying such models can be done with a randomized controlled trial (RCT) that randomizes users to ML vs. control groups and assesses the effect on relevant outcomes. However, ML models are inevitably updated over time, and we often lack evidence for the causal impact of these updates. While the causal effect could be repeatedly validated with ongoing RCTs, such experiments are expensive and time-consuming to run. In this work, we present an alternative solution: using only data from a prior RCT, we give conditions under which the causal impact of a new ML model can be precisely bounded or estimated, even if it was not included in the RCT. Our assumptions incorporate two realistic constraints: ML predictions are often deterministic, and their impacts depend on user trust in the model. Based on our analysis, we give recommendations for trial designs that maximize our ability to assess future versions of an ML model. Our hope is that our trial design recommendations will save practitioners time and resources while allowing for quicker deployments of updates to ML models.
APA
Chen, J.M. & Oberst, M.. (2025). Just Trial Once: Ongoing Causal Validation of Machine Learning Models. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:586-611 Available from https://proceedings.mlr.press/v286/chen25b.html.

Related Material