Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning

Yingjie Fei, Ruitu Xu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6392-6417, 2022.

Abstract

In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-fei22b, title = {Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning}, author = {Fei, Yingjie and Xu, Ruitu}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {6392--6417}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/fei22b/fei22b.pdf}, url = {https://proceedings.mlr.press/v162/fei22b.html}, abstract = {In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.} }
Endnote
%0 Conference Paper %T Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning %A Yingjie Fei %A Ruitu Xu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-fei22b %I PMLR %P 6392--6417 %U https://proceedings.mlr.press/v162/fei22b.html %V 162 %X In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.
APA
Fei, Y. & Xu, R.. (2022). Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6392-6417 Available from https://proceedings.mlr.press/v162/fei22b.html.

Related Material