Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning

Yingjie Fei; Ruitu Xu

Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning

Yingjie Fei, Ruitu Xu

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6392-6417, 2022.

Abstract

In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-fei22b,
  title = 	 {Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning},
  author =       {Fei, Yingjie and Xu, Ruitu},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {6392--6417},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/fei22b/fei22b.pdf},
  url = 	 {https://proceedings.mlr.press/v162/fei22b.html},
  abstract = 	 {In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.}
}

Endnote

%0 Conference Paper
%T Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning
%A Yingjie Fei
%A Ruitu Xu
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-fei22b
%I PMLR
%P 6392--6417
%U https://proceedings.mlr.press/v162/fei22b.html
%V 162
%X In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.

APA


Fei, Y. & Xu, R.. (2022). Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6392-6417 Available from https://proceedings.mlr.press/v162/fei22b.html.

Related Material

Download PDF