Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9580-9591, 2021.

Abstract

Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-shi21d, title = {Deeply-Debiased Off-Policy Interval Estimation}, author = {Shi, Chengchun and Wan, Runzhe and Chernozhukov, Victor and Song, Rui}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {9580--9591}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/shi21d/shi21d.pdf}, url = {https://proceedings.mlr.press/v139/shi21d.html}, abstract = {Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.} }
Endnote
%0 Conference Paper %T Deeply-Debiased Off-Policy Interval Estimation %A Chengchun Shi %A Runzhe Wan %A Victor Chernozhukov %A Rui Song %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-shi21d %I PMLR %P 9580--9591 %U https://proceedings.mlr.press/v139/shi21d.html %V 139 %X Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.
APA
Shi, C., Wan, R., Chernozhukov, V. & Song, R.. (2021). Deeply-Debiased Off-Policy Interval Estimation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9580-9591 Available from https://proceedings.mlr.press/v139/shi21d.html.

Related Material