Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach

Yingjie Fei, Zhuoran Yang, Zhaoran Wang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3198-3207, 2021.

Abstract

We study function approximation for episodic reinforcement learning with entropic risk measure. We first propose an algorithm with linear function approximation. Compared to existing algorithms, which suffer from improper regularization and regression biases, this algorithm features debiasing transformations in backward induction and regression procedures. We further propose an algorithm with general function approximation, which features implicit debiasing transformations. We prove that both algorithms achieve a sublinear regret and demonstrate a trade-off between generality and efficiency. Our analysis provides a unified framework for function approximation in risk-sensitive reinforcement learning, which leads to the first sublinear regret bounds in the setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-fei21a, title = {Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach}, author = {Fei, Yingjie and Yang, Zhuoran and Wang, Zhaoran}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {3198--3207}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/fei21a/fei21a.pdf}, url = {https://proceedings.mlr.press/v139/fei21a.html}, abstract = {We study function approximation for episodic reinforcement learning with entropic risk measure. We first propose an algorithm with linear function approximation. Compared to existing algorithms, which suffer from improper regularization and regression biases, this algorithm features debiasing transformations in backward induction and regression procedures. We further propose an algorithm with general function approximation, which features implicit debiasing transformations. We prove that both algorithms achieve a sublinear regret and demonstrate a trade-off between generality and efficiency. Our analysis provides a unified framework for function approximation in risk-sensitive reinforcement learning, which leads to the first sublinear regret bounds in the setting.} }
Endnote
%0 Conference Paper %T Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach %A Yingjie Fei %A Zhuoran Yang %A Zhaoran Wang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-fei21a %I PMLR %P 3198--3207 %U https://proceedings.mlr.press/v139/fei21a.html %V 139 %X We study function approximation for episodic reinforcement learning with entropic risk measure. We first propose an algorithm with linear function approximation. Compared to existing algorithms, which suffer from improper regularization and regression biases, this algorithm features debiasing transformations in backward induction and regression procedures. We further propose an algorithm with general function approximation, which features implicit debiasing transformations. We prove that both algorithms achieve a sublinear regret and demonstrate a trade-off between generality and efficiency. Our analysis provides a unified framework for function approximation in risk-sensitive reinforcement learning, which leads to the first sublinear regret bounds in the setting.
APA
Fei, Y., Yang, Z. & Wang, Z.. (2021). Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:3198-3207 Available from https://proceedings.mlr.press/v139/fei21a.html.

Related Material