Accountable Off-Policy Evaluation With Kernel Bellman Statistics

Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3102-3111, 2020.

Abstract

We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments, without requiring the execution of the new policy. This finds important applications in areas with high execution cost or safety concerns, such as medical diagnosis, recommendation systems and robotics. In practice, due to the limited information from off-policy data, it is highly desirable to construct rigorous confidence intervals, not just point estimation, for the policy performance. In this work, we propose a new variational framework which reduces the problem of calculating tight confidence bounds in OPE into an optimization problem on a feasible set that catches the true state-action value function with high probability. The feasible set is constructed by leveraging statistical properties of a recently proposed kernel Bellman loss (Feng et al., 2019). We design an efficient computational approach for calculating our bounds, and extend it to perform post-hoc diagnosis and correction for existing estimators. Empirical results show that our method yields tight confidence intervals in different settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-feng20d, title = {Accountable Off-Policy Evaluation With Kernel {B}ellman Statistics}, author = {Feng, Yihao and Ren, Tongzheng and Tang, Ziyang and Liu, Qiang}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3102--3111}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/feng20d/feng20d.pdf}, url = { http://proceedings.mlr.press/v119/feng20d.html }, abstract = {We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments, without requiring the execution of the new policy. This finds important applications in areas with high execution cost or safety concerns, such as medical diagnosis, recommendation systems and robotics. In practice, due to the limited information from off-policy data, it is highly desirable to construct rigorous confidence intervals, not just point estimation, for the policy performance. In this work, we propose a new variational framework which reduces the problem of calculating tight confidence bounds in OPE into an optimization problem on a feasible set that catches the true state-action value function with high probability. The feasible set is constructed by leveraging statistical properties of a recently proposed kernel Bellman loss (Feng et al., 2019). We design an efficient computational approach for calculating our bounds, and extend it to perform post-hoc diagnosis and correction for existing estimators. Empirical results show that our method yields tight confidence intervals in different settings.} }
Endnote
%0 Conference Paper %T Accountable Off-Policy Evaluation With Kernel Bellman Statistics %A Yihao Feng %A Tongzheng Ren %A Ziyang Tang %A Qiang Liu %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-feng20d %I PMLR %P 3102--3111 %U http://proceedings.mlr.press/v119/feng20d.html %V 119 %X We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments, without requiring the execution of the new policy. This finds important applications in areas with high execution cost or safety concerns, such as medical diagnosis, recommendation systems and robotics. In practice, due to the limited information from off-policy data, it is highly desirable to construct rigorous confidence intervals, not just point estimation, for the policy performance. In this work, we propose a new variational framework which reduces the problem of calculating tight confidence bounds in OPE into an optimization problem on a feasible set that catches the true state-action value function with high probability. The feasible set is constructed by leveraging statistical properties of a recently proposed kernel Bellman loss (Feng et al., 2019). We design an efficient computational approach for calculating our bounds, and extend it to perform post-hoc diagnosis and correction for existing estimators. Empirical results show that our method yields tight confidence intervals in different settings.
APA
Feng, Y., Ren, T., Tang, Z. & Liu, Q.. (2020). Accountable Off-Policy Evaluation With Kernel Bellman Statistics. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3102-3111 Available from http://proceedings.mlr.press/v119/feng20d.html .

Related Material