Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects

Santiago Cortes-Gomez, Naveen Janaki Raman, Aarti Singh, Bryan Wilder
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:11313-11327, 2025.

Abstract

Randomized controlled trials (RCTs) generate guarantees for treatment effects. However, RCTs often spend unnecessary resources exploring sub-optimal treatments, which can reduce the power of treatment guarantees. To address this, we propose a two-stage RCT design. In the first stage, a data-driven screening procedure prunes low-impact treatments, while the second stage focuses on developing high-probability lower bounds for the best-performing treatment effect. Unlike existing adaptive RCT frameworks, our method is simple enough to be implemented in scenarios with limited adaptivity. We derive optimal designs for two-stage RCTs and demonstrate how such designs can be implemented through sample splitting. Empirically, we demonstrate that two-stage designs improve upon single-stage approaches, especially for scenarios where domain knowledge is available through a prior. Our work is thus, a simple yet effective design for RCTs, optimizing for the ability to certify with high probability the largest possible treatment effect for at least one of the arms studied.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-cortes-gomez25a, title = {Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects}, author = {Cortes-Gomez, Santiago and Raman, Naveen Janaki and Singh, Aarti and Wilder, Bryan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {11313--11327}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/cortes-gomez25a/cortes-gomez25a.pdf}, url = {https://proceedings.mlr.press/v267/cortes-gomez25a.html}, abstract = {Randomized controlled trials (RCTs) generate guarantees for treatment effects. However, RCTs often spend unnecessary resources exploring sub-optimal treatments, which can reduce the power of treatment guarantees. To address this, we propose a two-stage RCT design. In the first stage, a data-driven screening procedure prunes low-impact treatments, while the second stage focuses on developing high-probability lower bounds for the best-performing treatment effect. Unlike existing adaptive RCT frameworks, our method is simple enough to be implemented in scenarios with limited adaptivity. We derive optimal designs for two-stage RCTs and demonstrate how such designs can be implemented through sample splitting. Empirically, we demonstrate that two-stage designs improve upon single-stage approaches, especially for scenarios where domain knowledge is available through a prior. Our work is thus, a simple yet effective design for RCTs, optimizing for the ability to certify with high probability the largest possible treatment effect for at least one of the arms studied.} }
Endnote
%0 Conference Paper %T Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects %A Santiago Cortes-Gomez %A Naveen Janaki Raman %A Aarti Singh %A Bryan Wilder %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-cortes-gomez25a %I PMLR %P 11313--11327 %U https://proceedings.mlr.press/v267/cortes-gomez25a.html %V 267 %X Randomized controlled trials (RCTs) generate guarantees for treatment effects. However, RCTs often spend unnecessary resources exploring sub-optimal treatments, which can reduce the power of treatment guarantees. To address this, we propose a two-stage RCT design. In the first stage, a data-driven screening procedure prunes low-impact treatments, while the second stage focuses on developing high-probability lower bounds for the best-performing treatment effect. Unlike existing adaptive RCT frameworks, our method is simple enough to be implemented in scenarios with limited adaptivity. We derive optimal designs for two-stage RCTs and demonstrate how such designs can be implemented through sample splitting. Empirically, we demonstrate that two-stage designs improve upon single-stage approaches, especially for scenarios where domain knowledge is available through a prior. Our work is thus, a simple yet effective design for RCTs, optimizing for the ability to certify with high probability the largest possible treatment effect for at least one of the arms studied.
APA
Cortes-Gomez, S., Raman, N.J., Singh, A. & Wilder, B.. (2025). Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:11313-11327 Available from https://proceedings.mlr.press/v267/cortes-gomez25a.html.

Related Material