Combining Experimental and Historical Data for Policy Evaluation

Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:28630-28656, 2024.

Abstract

This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-li24bh, title = {Combining Experimental and Historical Data for Policy Evaluation}, author = {Li, Ting and Shi, Chengchun and Wen, Qianglin and Sui, Yang and Qin, Yongli and Lai, Chunbo and Zhu, Hongtu}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {28630--28656}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24bh/li24bh.pdf}, url = {https://proceedings.mlr.press/v235/li24bh.html}, abstract = {This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.} }
Endnote
%0 Conference Paper %T Combining Experimental and Historical Data for Policy Evaluation %A Ting Li %A Chengchun Shi %A Qianglin Wen %A Yang Sui %A Yongli Qin %A Chunbo Lai %A Hongtu Zhu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-li24bh %I PMLR %P 28630--28656 %U https://proceedings.mlr.press/v235/li24bh.html %V 235 %X This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.
APA
Li, T., Shi, C., Wen, Q., Sui, Y., Qin, Y., Lai, C. & Zhu, H.. (2024). Combining Experimental and Historical Data for Policy Evaluation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:28630-28656 Available from https://proceedings.mlr.press/v235/li24bh.html.

Related Material