Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective

Hechuan Wen, Tong Chen, Mingming Gong, Li Kheng Chai, Shazia Sadiq, Hongzhi Yin
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:66437-66466, 2025.

Abstract

Although numerous complex algorithms for treatment effect estimation have been developed in recent years, their effectiveness remains limited when handling insufficiently labeled training sets due to the high cost of labeling the post-treatment effect, e.g., the expensive tumor imaging or biopsy procedures needed to evaluate treatment effects. Therefore, it becomes essential to actively incorporate more high-quality labeled data, all while adhering to a constrained labeling budget. To enable data-efficient treatment effect estimation, we formalize the problem through rigorous theoretical analysis within the active learning context, where the derived key measures – factual and counterfactual covering radii determine the risk upper bound. To reduce the bound, we propose a greedy radius reduction algorithm, which excels under an idealized, balanced data distribution. To generalize to more realistic data distributions, we further propose FCCM, which transforms the optimization objective into the Factual and Counterfactual Coverage Maximization to ensure effective radius reduction during data acquisition. Furthermore, benchmarking FCCM against other baselines demonstrates its superiority across both fully synthetic and semi-synthetic datasets. Code: https://github.com/uqhwen2/FCCM.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wen25b, title = {Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective}, author = {Wen, Hechuan and Chen, Tong and Gong, Mingming and Chai, Li Kheng and Sadiq, Shazia and Yin, Hongzhi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {66437--66466}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wen25b/wen25b.pdf}, url = {https://proceedings.mlr.press/v267/wen25b.html}, abstract = {Although numerous complex algorithms for treatment effect estimation have been developed in recent years, their effectiveness remains limited when handling insufficiently labeled training sets due to the high cost of labeling the post-treatment effect, e.g., the expensive tumor imaging or biopsy procedures needed to evaluate treatment effects. Therefore, it becomes essential to actively incorporate more high-quality labeled data, all while adhering to a constrained labeling budget. To enable data-efficient treatment effect estimation, we formalize the problem through rigorous theoretical analysis within the active learning context, where the derived key measures – factual and counterfactual covering radii determine the risk upper bound. To reduce the bound, we propose a greedy radius reduction algorithm, which excels under an idealized, balanced data distribution. To generalize to more realistic data distributions, we further propose FCCM, which transforms the optimization objective into the Factual and Counterfactual Coverage Maximization to ensure effective radius reduction during data acquisition. Furthermore, benchmarking FCCM against other baselines demonstrates its superiority across both fully synthetic and semi-synthetic datasets. Code: https://github.com/uqhwen2/FCCM.} }
Endnote
%0 Conference Paper %T Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective %A Hechuan Wen %A Tong Chen %A Mingming Gong %A Li Kheng Chai %A Shazia Sadiq %A Hongzhi Yin %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wen25b %I PMLR %P 66437--66466 %U https://proceedings.mlr.press/v267/wen25b.html %V 267 %X Although numerous complex algorithms for treatment effect estimation have been developed in recent years, their effectiveness remains limited when handling insufficiently labeled training sets due to the high cost of labeling the post-treatment effect, e.g., the expensive tumor imaging or biopsy procedures needed to evaluate treatment effects. Therefore, it becomes essential to actively incorporate more high-quality labeled data, all while adhering to a constrained labeling budget. To enable data-efficient treatment effect estimation, we formalize the problem through rigorous theoretical analysis within the active learning context, where the derived key measures – factual and counterfactual covering radii determine the risk upper bound. To reduce the bound, we propose a greedy radius reduction algorithm, which excels under an idealized, balanced data distribution. To generalize to more realistic data distributions, we further propose FCCM, which transforms the optimization objective into the Factual and Counterfactual Coverage Maximization to ensure effective radius reduction during data acquisition. Furthermore, benchmarking FCCM against other baselines demonstrates its superiority across both fully synthetic and semi-synthetic datasets. Code: https://github.com/uqhwen2/FCCM.
APA
Wen, H., Chen, T., Gong, M., Chai, L.K., Sadiq, S. & Yin, H.. (2025). Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:66437-66466 Available from https://proceedings.mlr.press/v267/wen25b.html.

Related Material