Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces

Ziyi Chen, Heng Huang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:6847-6880, 2024.

Abstract

Robust Markov decision process (robust MDP) is an important machine learning framework to make a reliable policy that is robust to environmental perturbation. Despite empirical success and popularity of policy gradient methods, existing policy gradient methods require at least iteration complexity $\mathcal{O}(\epsilon^{-4})$ to converge to the global optimal solution of s-rectangular robust MDPs with $\epsilon$-accuracy and are limited to deterministic setting with access to exact gradients and small state space that are impractical in many applications. In this work, we propose an accelerated policy gradient algorithm with iteration complexity $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$ in the deterministic setting using entropy regularization. Furthermore, we extend this algorithm to stochastic setting with access to only stochastic gradients and large state space which achieves the sample complexity $\mathcal{O}(\epsilon^{-7}\ln\epsilon^{-1})$. In the meantime, our algorithms are also the first scalable policy gradient methods to entropy-regularized robust MDPs, which provide an important but underexplored machine learning framework.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-chen24s, title = {Accelerated Policy Gradient for s-rectangular Robust {MDP}s with Large State Spaces}, author = {Chen, Ziyi and Huang, Heng}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {6847--6880}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24s/chen24s.pdf}, url = {https://proceedings.mlr.press/v235/chen24s.html}, abstract = {Robust Markov decision process (robust MDP) is an important machine learning framework to make a reliable policy that is robust to environmental perturbation. Despite empirical success and popularity of policy gradient methods, existing policy gradient methods require at least iteration complexity $\mathcal{O}(\epsilon^{-4})$ to converge to the global optimal solution of s-rectangular robust MDPs with $\epsilon$-accuracy and are limited to deterministic setting with access to exact gradients and small state space that are impractical in many applications. In this work, we propose an accelerated policy gradient algorithm with iteration complexity $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$ in the deterministic setting using entropy regularization. Furthermore, we extend this algorithm to stochastic setting with access to only stochastic gradients and large state space which achieves the sample complexity $\mathcal{O}(\epsilon^{-7}\ln\epsilon^{-1})$. In the meantime, our algorithms are also the first scalable policy gradient methods to entropy-regularized robust MDPs, which provide an important but underexplored machine learning framework.} }
Endnote
%0 Conference Paper %T Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces %A Ziyi Chen %A Heng Huang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-chen24s %I PMLR %P 6847--6880 %U https://proceedings.mlr.press/v235/chen24s.html %V 235 %X Robust Markov decision process (robust MDP) is an important machine learning framework to make a reliable policy that is robust to environmental perturbation. Despite empirical success and popularity of policy gradient methods, existing policy gradient methods require at least iteration complexity $\mathcal{O}(\epsilon^{-4})$ to converge to the global optimal solution of s-rectangular robust MDPs with $\epsilon$-accuracy and are limited to deterministic setting with access to exact gradients and small state space that are impractical in many applications. In this work, we propose an accelerated policy gradient algorithm with iteration complexity $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$ in the deterministic setting using entropy regularization. Furthermore, we extend this algorithm to stochastic setting with access to only stochastic gradients and large state space which achieves the sample complexity $\mathcal{O}(\epsilon^{-7}\ln\epsilon^{-1})$. In the meantime, our algorithms are also the first scalable policy gradient methods to entropy-regularized robust MDPs, which provide an important but underexplored machine learning framework.
APA
Chen, Z. & Huang, H.. (2024). Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:6847-6880 Available from https://proceedings.mlr.press/v235/chen24s.html.

Related Material