On the Linear Convergence of Policy Gradient Methods for Finite MDPs

Jalaj Bhandari, Daniel Russo
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2386-2394, 2021.

Abstract

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-bhandari21a, title = { On the Linear Convergence of Policy Gradient Methods for Finite MDPs }, author = {Bhandari, Jalaj and Russo, Daniel}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2386--2394}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/bhandari21a/bhandari21a.pdf}, url = {https://proceedings.mlr.press/v130/bhandari21a.html}, abstract = { We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence. } }
Endnote
%0 Conference Paper %T On the Linear Convergence of Policy Gradient Methods for Finite MDPs %A Jalaj Bhandari %A Daniel Russo %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-bhandari21a %I PMLR %P 2386--2394 %U https://proceedings.mlr.press/v130/bhandari21a.html %V 130 %X We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence.
APA
Bhandari, J. & Russo, D.. (2021). On the Linear Convergence of Policy Gradient Methods for Finite MDPs . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2386-2394 Available from https://proceedings.mlr.press/v130/bhandari21a.html.

Related Material