[edit]
Entropic Risk Optimization in Discounted MDPs
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:47-76, 2023.
Abstract
Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve high returns with low variability, but these MDPs are often difficult to solve. Only a few practical risk-averse objectives admit a dynamic programming (DP) formulation, which is the mainstay of most MDP and RL algorithms. We derive a new DP formulation for discounted risk-averse MDPs with Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) objectives. Our DP formulation for ERM, which is possible because of our novel definition of value function with time-dependent risk levels, can approximate optimal policies in a time that is polynomial in the approximation error. We then use the ERM algorithm to optimize the EVaR objective in polynomial time using an optimized discretization scheme. Our numerical results show the viability of our formulations and algorithms in discounted MDPs.