Adaptive Estimator Selection for Off-Policy Evaluation

Yi Su, Pavithra Srinath, Akshay Krishnamurthy
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9196-9205, 2020.

Abstract

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrate the generality and applicability of the method. We also perform comprehensive experiments, demonstrating the empirical efficacy of our approach and comparing with related approaches. In both case studies, our method compares favorably with existing methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-su20d, title = {Adaptive Estimator Selection for Off-Policy Evaluation}, author = {Su, Yi and Srinath, Pavithra and Krishnamurthy, Akshay}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9196--9205}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/su20d/su20d.pdf}, url = {https://proceedings.mlr.press/v119/su20d.html}, abstract = {We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrate the generality and applicability of the method. We also perform comprehensive experiments, demonstrating the empirical efficacy of our approach and comparing with related approaches. In both case studies, our method compares favorably with existing methods.} }
Endnote
%0 Conference Paper %T Adaptive Estimator Selection for Off-Policy Evaluation %A Yi Su %A Pavithra Srinath %A Akshay Krishnamurthy %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-su20d %I PMLR %P 9196--9205 %U https://proceedings.mlr.press/v119/su20d.html %V 119 %X We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrate the generality and applicability of the method. We also perform comprehensive experiments, demonstrating the empirical efficacy of our approach and comparing with related approaches. In both case studies, our method compares favorably with existing methods.
APA
Su, Y., Srinath, P. & Krishnamurthy, A.. (2020). Adaptive Estimator Selection for Off-Policy Evaluation. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9196-9205 Available from https://proceedings.mlr.press/v119/su20d.html.

Related Material