Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Jia Lin Hau; Erick Delage; Esther Derman; Mohammad Ghavamzadeh; Marek Petrik

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Jia Lin Hau, Erick Delage, Esther Derman, Mohammad Ghavamzadeh, Marek Petrik

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2665-2673, 2025.

Abstract

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents’ preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.

Cite this Paper

BibTeX

@InProceedings{pmlr-v258-hau25a,
  title = 	 {Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis},
  author =       {Hau, Jia Lin and Delage, Erick and Derman, Esther and Ghavamzadeh, Mohammad and Petrik, Marek},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2665--2673},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/hau25a/hau25a.pdf},
  url = 	 {https://proceedings.mlr.press/v258/hau25a.html},
  abstract = 	 {In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents’ preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.}
}

Endnote

%0 Conference Paper
%T Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
%A Jia Lin Hau
%A Erick Delage
%A Esther Derman
%A Mohammad Ghavamzadeh
%A Marek Petrik
%B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2025
%E Yingzhen Li
%E Stephan Mandt
%E Shipra Agrawal
%E Emtiyaz Khan	
%F pmlr-v258-hau25a
%I PMLR
%P 2665--2673
%U https://proceedings.mlr.press/v258/hau25a.html
%V 258
%X In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents’ preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.

APA

Hau, J.L., Delage, E., Derman, E., Ghavamzadeh, M. & Petrik, M.. (2025). Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2665-2673 Available from https://proceedings.mlr.press/v258/hau25a.html.

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Abstract

Cite this Paper

Related Material