A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes

Zhenwei Lin, Chenyu Xue, Qi Deng, Yinyu Ye
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:30392-30426, 2024.

Abstract

Robust Markov Decision Processes (RMDPs) have recently been recognized as a valuable and promising approach to discovering a policy with creditable performance, particularly in the presence of a dynamic environment and estimation errors in the transition matrix due to limited data. Despite extensive exploration of dynamic programming algorithms for solving RMDPs, there has been a notable upswing in interest in developing efficient algorithms using the policy gradient method. In this paper, we propose the first single-loop robust policy gradient (SRPG) method with the global optimality guarantee for solving RMDPs through its minimax formulation. Moreover, we complement the convergence analysis of the nonconvex-nonconcave min-max optimization problem with the objective function’s gradient dominance property, which is not explored in the prior literature. Numerical experiments validate the efficacy of SRPG, demonstrating its faster and more robust convergence behavior compared to its nested-loop counterpart.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-lin24u, title = {A Single-Loop Robust Policy Gradient Method for Robust {M}arkov Decision Processes}, author = {Lin, Zhenwei and Xue, Chenyu and Deng, Qi and Ye, Yinyu}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {30392--30426}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/lin24u/lin24u.pdf}, url = {https://proceedings.mlr.press/v235/lin24u.html}, abstract = {Robust Markov Decision Processes (RMDPs) have recently been recognized as a valuable and promising approach to discovering a policy with creditable performance, particularly in the presence of a dynamic environment and estimation errors in the transition matrix due to limited data. Despite extensive exploration of dynamic programming algorithms for solving RMDPs, there has been a notable upswing in interest in developing efficient algorithms using the policy gradient method. In this paper, we propose the first single-loop robust policy gradient (SRPG) method with the global optimality guarantee for solving RMDPs through its minimax formulation. Moreover, we complement the convergence analysis of the nonconvex-nonconcave min-max optimization problem with the objective function’s gradient dominance property, which is not explored in the prior literature. Numerical experiments validate the efficacy of SRPG, demonstrating its faster and more robust convergence behavior compared to its nested-loop counterpart.} }
Endnote
%0 Conference Paper %T A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes %A Zhenwei Lin %A Chenyu Xue %A Qi Deng %A Yinyu Ye %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-lin24u %I PMLR %P 30392--30426 %U https://proceedings.mlr.press/v235/lin24u.html %V 235 %X Robust Markov Decision Processes (RMDPs) have recently been recognized as a valuable and promising approach to discovering a policy with creditable performance, particularly in the presence of a dynamic environment and estimation errors in the transition matrix due to limited data. Despite extensive exploration of dynamic programming algorithms for solving RMDPs, there has been a notable upswing in interest in developing efficient algorithms using the policy gradient method. In this paper, we propose the first single-loop robust policy gradient (SRPG) method with the global optimality guarantee for solving RMDPs through its minimax formulation. Moreover, we complement the convergence analysis of the nonconvex-nonconcave min-max optimization problem with the objective function’s gradient dominance property, which is not explored in the prior literature. Numerical experiments validate the efficacy of SRPG, demonstrating its faster and more robust convergence behavior compared to its nested-loop counterpart.
APA
Lin, Z., Xue, C., Deng, Q. & Ye, Y.. (2024). A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:30392-30426 Available from https://proceedings.mlr.press/v235/lin24u.html.

Related Material