Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Celi, Emma Brunskill, Finale Doshi-Velez
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3658-3667, 2020.

Abstract

Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates. This is accomplished by highlighting observations in the data whose removal will have a large effect on the OPE estimate, and formulating a set of rules for choosing which ones to present to domain experts for validation. We develop methods to compute exactly the influence functions for fitted Q-evaluation with two different function classes: kernel-based and linear least squares, as well as importance sampling methods. Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-gottesman20a, title = {Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions}, author = {Gottesman, Omer and Futoma, Joseph and Liu, Yao and Parbhoo, Sonali and Celi, Leo and Brunskill, Emma and Doshi-Velez, Finale}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3658--3667}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/gottesman20a/gottesman20a.pdf}, url = {https://proceedings.mlr.press/v119/gottesman20a.html}, abstract = {Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates. This is accomplished by highlighting observations in the data whose removal will have a large effect on the OPE estimate, and formulating a set of rules for choosing which ones to present to domain experts for validation. We develop methods to compute exactly the influence functions for fitted Q-evaluation with two different function classes: kernel-based and linear least squares, as well as importance sampling methods. Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust.} }
Endnote
%0 Conference Paper %T Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions %A Omer Gottesman %A Joseph Futoma %A Yao Liu %A Sonali Parbhoo %A Leo Celi %A Emma Brunskill %A Finale Doshi-Velez %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-gottesman20a %I PMLR %P 3658--3667 %U https://proceedings.mlr.press/v119/gottesman20a.html %V 119 %X Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates. This is accomplished by highlighting observations in the data whose removal will have a large effect on the OPE estimate, and formulating a set of rules for choosing which ones to present to domain experts for validation. We develop methods to compute exactly the influence functions for fitted Q-evaluation with two different function classes: kernel-based and linear least squares, as well as importance sampling methods. Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust.
APA
Gottesman, O., Futoma, J., Liu, Y., Parbhoo, S., Celi, L., Brunskill, E. & Doshi-Velez, F.. (2020). Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3658-3667 Available from https://proceedings.mlr.press/v119/gottesman20a.html.

Related Material