An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Yang Xu; Jin Zhu; Chengchun Shi; Shikai Luo; Rui Song

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Yang Xu, Jin Zhu, Chengchun Shi, Shikai Luo, Rui Song

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:38848-38880, 2023.

Abstract

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-xu23x,
  title = 	 {An Instrumental Variable Approach to Confounded Off-Policy Evaluation},
  author =       {Xu, Yang and Zhu, Jin and Shi, Chengchun and Luo, Shikai and Song, Rui},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {38848--38880},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/xu23x/xu23x.pdf},
  url = 	 {https://proceedings.mlr.press/v202/xu23x.html},
  abstract = 	 {Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.}
}

Endnote

%0 Conference Paper
%T An Instrumental Variable Approach to Confounded Off-Policy Evaluation
%A Yang Xu
%A Jin Zhu
%A Chengchun Shi
%A Shikai Luo
%A Rui Song
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-xu23x
%I PMLR
%P 38848--38880
%U https://proceedings.mlr.press/v202/xu23x.html
%V 202
%X Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

APA


Xu, Y., Zhu, J., Shi, C., Luo, S. & Song, R.. (2023). An Instrumental Variable Approach to Confounded Off-Policy Evaluation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:38848-38880 Available from https://proceedings.mlr.press/v202/xu23x.html.

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Abstract

Cite this Paper

Related Material