Value-aware Importance Weighting for Off-policy Reinforcement Learning

Kristopher De Asis; Eric Graves; Richard S. Sutton

Value-aware Importance Weighting for Off-policy Reinforcement Learning

Kristopher De Asis, Eric Graves, Richard S. Sutton

Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232:745-763, 2023.

Abstract

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to represent unbiased estimates of another distribution. However, importance sampling weights tend to be of high variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of value-aware importance weights which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.

Cite this Paper

BibTeX


@InProceedings{pmlr-v232-de-asis23a,
  title = 	 {Value-aware Importance Weighting for Off-policy Reinforcement Learning},
  author =       {De Asis, Kristopher and Graves, Eric and Sutton, Richard S.},
  booktitle = 	 {Proceedings of The 2nd Conference on Lifelong Learning Agents},
  pages = 	 {745--763},
  year = 	 {2023},
  editor = 	 {Chandar, Sarath and Pascanu, Razvan and Sedghi, Hanie and Precup, Doina},
  volume = 	 {232},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {22--25 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v232/de-asis23a/de-asis23a.pdf},
  url = 	 {https://proceedings.mlr.press/v232/de-asis23a.html},
  abstract = 	 {Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to represent unbiased estimates of another distribution. However, importance sampling weights tend to be of high variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of value-aware importance weights which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.}
}

Endnote

%0 Conference Paper
%T Value-aware Importance Weighting for Off-policy Reinforcement Learning
%A Kristopher De Asis
%A Eric Graves
%A Richard S. Sutton
%B Proceedings of The 2nd Conference on Lifelong Learning Agents
%C Proceedings of Machine Learning Research
%D 2023
%E Sarath Chandar
%E Razvan Pascanu
%E Hanie Sedghi
%E Doina Precup	
%F pmlr-v232-de-asis23a
%I PMLR
%P 745--763
%U https://proceedings.mlr.press/v232/de-asis23a.html
%V 232
%X Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to represent unbiased estimates of another distribution. However, importance sampling weights tend to be of high variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of value-aware importance weights which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.

APA


De Asis, K., Graves, E. & Sutton, R.S.. (2023). Value-aware Importance Weighting for Off-policy Reinforcement Learning. Proceedings of The 2nd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 232:745-763 Available from https://proceedings.mlr.press/v232/de-asis23a.html.

Value-aware Importance Weighting for Off-policy Reinforcement Learning

Abstract

Cite this Paper

Related Material