State Relevance for Off-Policy Evaluation

Simon P Shen, Yecheng Ma, Omer Gottesman, Finale Doshi-Velez
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9537-9546, 2021.

Abstract

Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces variance by strategically omitting likelihood ratios associated with certain states. We formalize the conditions under which OSIRIS is unbiased and has lower variance than ordinary importance sampling, and we demonstrate these properties empirically.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-shen21d, title = {State Relevance for Off-Policy Evaluation}, author = {Shen, Simon P and Ma, Yecheng and Gottesman, Omer and Doshi-Velez, Finale}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {9537--9546}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/shen21d/shen21d.pdf}, url = {https://proceedings.mlr.press/v139/shen21d.html}, abstract = {Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces variance by strategically omitting likelihood ratios associated with certain states. We formalize the conditions under which OSIRIS is unbiased and has lower variance than ordinary importance sampling, and we demonstrate these properties empirically.} }
Endnote
%0 Conference Paper %T State Relevance for Off-Policy Evaluation %A Simon P Shen %A Yecheng Ma %A Omer Gottesman %A Finale Doshi-Velez %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-shen21d %I PMLR %P 9537--9546 %U https://proceedings.mlr.press/v139/shen21d.html %V 139 %X Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces variance by strategically omitting likelihood ratios associated with certain states. We formalize the conditions under which OSIRIS is unbiased and has lower variance than ordinary importance sampling, and we demonstrate these properties empirically.
APA
Shen, S.P., Ma, Y., Gottesman, O. & Doshi-Velez, F.. (2021). State Relevance for Off-Policy Evaluation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9537-9546 Available from https://proceedings.mlr.press/v139/shen21d.html.

Related Material