Value-aware Importance Weighting for Off-policy Reinforcement Learning

Kristopher De Asis, Eric Graves, Richard S. Sutton
Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232:745-763, 2023.

Abstract

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to represent unbiased estimates of another distribution. However, importance sampling weights tend to be of high variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of value-aware importance weights which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.

Cite this Paper


BibTeX
@InProceedings{pmlr-v232-de-asis23a, title = {Value-aware Importance Weighting for Off-policy Reinforcement Learning}, author = {De Asis, Kristopher and Graves, Eric and Sutton, Richard S.}, booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents}, pages = {745--763}, year = {2023}, editor = {Chandar, Sarath and Pascanu, Razvan and Sedghi, Hanie and Precup, Doina}, volume = {232}, series = {Proceedings of Machine Learning Research}, month = {22--25 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v232/de-asis23a/de-asis23a.pdf}, url = {https://proceedings.mlr.press/v232/de-asis23a.html}, abstract = {Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to represent unbiased estimates of another distribution. However, importance sampling weights tend to be of high variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of value-aware importance weights which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.} }
Endnote
%0 Conference Paper %T Value-aware Importance Weighting for Off-policy Reinforcement Learning %A Kristopher De Asis %A Eric Graves %A Richard S. Sutton %B Proceedings of The 2nd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2023 %E Sarath Chandar %E Razvan Pascanu %E Hanie Sedghi %E Doina Precup %F pmlr-v232-de-asis23a %I PMLR %P 745--763 %U https://proceedings.mlr.press/v232/de-asis23a.html %V 232 %X Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to represent unbiased estimates of another distribution. However, importance sampling weights tend to be of high variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of value-aware importance weights which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.
APA
De Asis, K., Graves, E. & Sutton, R.S.. (2023). Value-aware Importance Weighting for Off-policy Reinforcement Learning. Proceedings of The 2nd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 232:745-763 Available from https://proceedings.mlr.press/v232/de-asis23a.html.

Related Material