Conditional Importance Sampling for Off-Policy Learning

Mark Rowland, Anna Harutyunyan, Hado Hasselt, Diana Borsa, Tom Schaul, Remi Munos, Will Dabney
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:45-55, 2020.

Abstract

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-rowland20b, title = {Conditional Importance Sampling for Off-Policy Learning}, author = {Rowland, Mark and Harutyunyan, Anna and van Hasselt, Hado and Borsa, Diana and Schaul, Tom and Munos, Remi and Dabney, Will}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {45--55}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/rowland20b/rowland20b.pdf}, url = { http://proceedings.mlr.press/v108/rowland20b.html }, abstract = {The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.} }
Endnote
%0 Conference Paper %T Conditional Importance Sampling for Off-Policy Learning %A Mark Rowland %A Anna Harutyunyan %A Hado Hasselt %A Diana Borsa %A Tom Schaul %A Remi Munos %A Will Dabney %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-rowland20b %I PMLR %P 45--55 %U http://proceedings.mlr.press/v108/rowland20b.html %V 108 %X The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
APA
Rowland, M., Harutyunyan, A., Hasselt, H., Borsa, D., Schaul, T., Munos, R. & Dabney, W.. (2020). Conditional Importance Sampling for Off-Policy Learning. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:45-55 Available from http://proceedings.mlr.press/v108/rowland20b.html .

Related Material