Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Jia Lin Hau, Erick Delage, Esther Derman, Mohammad Ghavamzadeh, Marek Petrik
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2665-2673, 2025.

Abstract

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents’ preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-hau25a, title = {Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis}, author = {Hau, Jia Lin and Delage, Erick and Derman, Esther and Ghavamzadeh, Mohammad and Petrik, Marek}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2665--2673}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/hau25a/hau25a.pdf}, url = {https://proceedings.mlr.press/v258/hau25a.html}, abstract = {In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents’ preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.} }
Endnote
%0 Conference Paper %T Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis %A Jia Lin Hau %A Erick Delage %A Esther Derman %A Mohammad Ghavamzadeh %A Marek Petrik %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-hau25a %I PMLR %P 2665--2673 %U https://proceedings.mlr.press/v258/hau25a.html %V 258 %X In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents’ preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.
APA
Hau, J.L., Delage, E., Derman, E., Ghavamzadeh, M. & Petrik, M.. (2025). Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2665-2673 Available from https://proceedings.mlr.press/v258/hau25a.html.

Related Material