Entropic Risk Optimization in Discounted MDPs

Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:47-76, 2023.

Abstract

Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve high returns with low variability, but these MDPs are often difficult to solve. Only a few practical risk-averse objectives admit a dynamic programming (DP) formulation, which is the mainstay of most MDP and RL algorithms. We derive a new DP formulation for discounted risk-averse MDPs with Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) objectives. Our DP formulation for ERM, which is possible because of our novel definition of value function with time-dependent risk levels, can approximate optimal policies in a time that is polynomial in the approximation error. We then use the ERM algorithm to optimize the EVaR objective in polynomial time using an optimized discretization scheme. Our numerical results show the viability of our formulations and algorithms in discounted MDPs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-lin-hau23a, title = {Entropic Risk Optimization in Discounted MDPs}, author = {Lin Hau, Jia and Petrik, Marek and Ghavamzadeh, Mohammad}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {47--76}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/lin-hau23a/lin-hau23a.pdf}, url = {https://proceedings.mlr.press/v206/lin-hau23a.html}, abstract = {Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve high returns with low variability, but these MDPs are often difficult to solve. Only a few practical risk-averse objectives admit a dynamic programming (DP) formulation, which is the mainstay of most MDP and RL algorithms. We derive a new DP formulation for discounted risk-averse MDPs with Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) objectives. Our DP formulation for ERM, which is possible because of our novel definition of value function with time-dependent risk levels, can approximate optimal policies in a time that is polynomial in the approximation error. We then use the ERM algorithm to optimize the EVaR objective in polynomial time using an optimized discretization scheme. Our numerical results show the viability of our formulations and algorithms in discounted MDPs.} }
Endnote
%0 Conference Paper %T Entropic Risk Optimization in Discounted MDPs %A Jia Lin Hau %A Marek Petrik %A Mohammad Ghavamzadeh %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-lin-hau23a %I PMLR %P 47--76 %U https://proceedings.mlr.press/v206/lin-hau23a.html %V 206 %X Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve high returns with low variability, but these MDPs are often difficult to solve. Only a few practical risk-averse objectives admit a dynamic programming (DP) formulation, which is the mainstay of most MDP and RL algorithms. We derive a new DP formulation for discounted risk-averse MDPs with Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) objectives. Our DP formulation for ERM, which is possible because of our novel definition of value function with time-dependent risk levels, can approximate optimal policies in a time that is polynomial in the approximation error. We then use the ERM algorithm to optimize the EVaR objective in polynomial time using an optimized discretization scheme. Our numerical results show the viability of our formulations and algorithms in discounted MDPs.
APA
Lin Hau, J., Petrik, M. & Ghavamzadeh, M.. (2023). Entropic Risk Optimization in Discounted MDPs. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:47-76 Available from https://proceedings.mlr.press/v206/lin-hau23a.html.

Related Material