Risk-Sensitive Reward-Free Reinforcement Learning with CVaR

Xinyi Ni, Guanlin Liu, Lifeng Lai
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:37999-38017, 2024.

Abstract

Exploration is a crucial phase in reinforcement learning (RL). The reward-free RL paradigm, as explored by (Jin et al., 2020), offers an efficient method to design exploration algorithms for risk-neutral RL across various reward functions with a single exploration phase. However, as RL applications in safety critical settings grow, there’s an increasing need for risk-sensitive RL, which considers potential risks in decision-making. Yet, efficient exploration strategies for risk-sensitive RL remain underdeveloped. This study presents a novel risk-sensitive reward-free framework based on Conditional Value-at-Risk (CVaR), designed to effectively address CVaR RL for any given reward function through a single exploration phase. We introduce the CVaR-RF-UCRL algorithm, which is shown to be $(\epsilon,p)$-PAC, with a sample complexity upper bounded by $\tilde{\mathcal{O}}\left(\frac{S^2AH^4}{\epsilon^2\tau^2}\right)$ with $\tau$ being the risk tolerance parameter. We also prove a $\Omega\left(\frac{S^2AH^2}{\epsilon^2\tau}\right)$ lower bound for any CVaR-RF exploration algorithm, demonstrating the near-optimality of our algorithm. Additionally, we propose the planning algorithms: CVaR-VI and its more practical variant, CVaR-VI-DISC. The effectiveness and practicality of our CVaR reward-free approach are further validated through numerical experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ni24c, title = {Risk-Sensitive Reward-Free Reinforcement Learning with {CV}a{R}}, author = {Ni, Xinyi and Liu, Guanlin and Lai, Lifeng}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {37999--38017}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ni24c/ni24c.pdf}, url = {https://proceedings.mlr.press/v235/ni24c.html}, abstract = {Exploration is a crucial phase in reinforcement learning (RL). The reward-free RL paradigm, as explored by (Jin et al., 2020), offers an efficient method to design exploration algorithms for risk-neutral RL across various reward functions with a single exploration phase. However, as RL applications in safety critical settings grow, there’s an increasing need for risk-sensitive RL, which considers potential risks in decision-making. Yet, efficient exploration strategies for risk-sensitive RL remain underdeveloped. This study presents a novel risk-sensitive reward-free framework based on Conditional Value-at-Risk (CVaR), designed to effectively address CVaR RL for any given reward function through a single exploration phase. We introduce the CVaR-RF-UCRL algorithm, which is shown to be $(\epsilon,p)$-PAC, with a sample complexity upper bounded by $\tilde{\mathcal{O}}\left(\frac{S^2AH^4}{\epsilon^2\tau^2}\right)$ with $\tau$ being the risk tolerance parameter. We also prove a $\Omega\left(\frac{S^2AH^2}{\epsilon^2\tau}\right)$ lower bound for any CVaR-RF exploration algorithm, demonstrating the near-optimality of our algorithm. Additionally, we propose the planning algorithms: CVaR-VI and its more practical variant, CVaR-VI-DISC. The effectiveness and practicality of our CVaR reward-free approach are further validated through numerical experiments.} }
Endnote
%0 Conference Paper %T Risk-Sensitive Reward-Free Reinforcement Learning with CVaR %A Xinyi Ni %A Guanlin Liu %A Lifeng Lai %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ni24c %I PMLR %P 37999--38017 %U https://proceedings.mlr.press/v235/ni24c.html %V 235 %X Exploration is a crucial phase in reinforcement learning (RL). The reward-free RL paradigm, as explored by (Jin et al., 2020), offers an efficient method to design exploration algorithms for risk-neutral RL across various reward functions with a single exploration phase. However, as RL applications in safety critical settings grow, there’s an increasing need for risk-sensitive RL, which considers potential risks in decision-making. Yet, efficient exploration strategies for risk-sensitive RL remain underdeveloped. This study presents a novel risk-sensitive reward-free framework based on Conditional Value-at-Risk (CVaR), designed to effectively address CVaR RL for any given reward function through a single exploration phase. We introduce the CVaR-RF-UCRL algorithm, which is shown to be $(\epsilon,p)$-PAC, with a sample complexity upper bounded by $\tilde{\mathcal{O}}\left(\frac{S^2AH^4}{\epsilon^2\tau^2}\right)$ with $\tau$ being the risk tolerance parameter. We also prove a $\Omega\left(\frac{S^2AH^2}{\epsilon^2\tau}\right)$ lower bound for any CVaR-RF exploration algorithm, demonstrating the near-optimality of our algorithm. Additionally, we propose the planning algorithms: CVaR-VI and its more practical variant, CVaR-VI-DISC. The effectiveness and practicality of our CVaR reward-free approach are further validated through numerical experiments.
APA
Ni, X., Liu, G. & Lai, L.. (2024). Risk-Sensitive Reward-Free Reinforcement Learning with CVaR. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:37999-38017 Available from https://proceedings.mlr.press/v235/ni24c.html.

Related Material