An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Yang Xu, Jin Zhu, Chengchun Shi, Shikai Luo, Rui Song
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:38848-38880, 2023.

Abstract

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-xu23x, title = {An Instrumental Variable Approach to Confounded Off-Policy Evaluation}, author = {Xu, Yang and Zhu, Jin and Shi, Chengchun and Luo, Shikai and Song, Rui}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {38848--38880}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/xu23x/xu23x.pdf}, url = {https://proceedings.mlr.press/v202/xu23x.html}, abstract = {Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.} }
Endnote
%0 Conference Paper %T An Instrumental Variable Approach to Confounded Off-Policy Evaluation %A Yang Xu %A Jin Zhu %A Chengchun Shi %A Shikai Luo %A Rui Song %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-xu23x %I PMLR %P 38848--38880 %U https://proceedings.mlr.press/v202/xu23x.html %V 202 %X Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.
APA
Xu, Y., Zhu, J., Shi, C., Luo, S. & Song, R.. (2023). An Instrumental Variable Approach to Confounded Off-Policy Evaluation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:38848-38880 Available from https://proceedings.mlr.press/v202/xu23x.html.

Related Material