Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi; Runzhe Wan; Victor Chernozhukov; Rui Song

Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9580-9591, 2021.

Abstract

Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-shi21d,
  title = 	 {Deeply-Debiased Off-Policy Interval Estimation},
  author =       {Shi, Chengchun and Wan, Runzhe and Chernozhukov, Victor and Song, Rui},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {9580--9591},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/shi21d/shi21d.pdf},
  url = 	 {https://proceedings.mlr.press/v139/shi21d.html},
  abstract = 	 {Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.}
}

Endnote

%0 Conference Paper
%T Deeply-Debiased Off-Policy Interval Estimation
%A Chengchun Shi
%A Runzhe Wan
%A Victor Chernozhukov
%A Rui Song
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-shi21d
%I PMLR
%P 9580--9591
%U https://proceedings.mlr.press/v139/shi21d.html
%V 139
%X Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.

APA

Shi, C., Wan, R., Chernozhukov, V. & Song, R.. (2021). Deeply-Debiased Off-Policy Interval Estimation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9580-9591 Available from https://proceedings.mlr.press/v139/shi21d.html.

Deeply-Debiased Off-Policy Interval Estimation

Abstract

Cite this Paper

Related Material