Taylor Expansion of Discount Factors

Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:10130-10140, 2021.

Abstract

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-tang21b, title = {Taylor Expansion of Discount Factors}, author = {Tang, Yunhao and Rowland, Mark and Munos, Remi and Valko, Michal}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {10130--10140}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/tang21b/tang21b.pdf}, url = {https://proceedings.mlr.press/v139/tang21b.html}, abstract = {In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.} }
Endnote
%0 Conference Paper %T Taylor Expansion of Discount Factors %A Yunhao Tang %A Mark Rowland %A Remi Munos %A Michal Valko %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-tang21b %I PMLR %P 10130--10140 %U https://proceedings.mlr.press/v139/tang21b.html %V 139 %X In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.
APA
Tang, Y., Rowland, M., Munos, R. & Valko, M.. (2021). Taylor Expansion of Discount Factors. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:10130-10140 Available from https://proceedings.mlr.press/v139/tang21b.html.

Related Material