On the Linear Convergence of Policy Gradient Methods for Finite MDPs

Jalaj Bhandari; Daniel Russo

On the Linear Convergence of Policy Gradient Methods for Finite MDPs

Jalaj Bhandari, Daniel Russo

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2386-2394, 2021.

Abstract

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence.

Cite this Paper

BibTeX

@InProceedings{pmlr-v130-bhandari21a,
  title = 	 { On the Linear Convergence of Policy Gradient Methods for Finite MDPs },
  author =       {Bhandari, Jalaj and Russo, Daniel},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2386--2394},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/bhandari21a/bhandari21a.pdf},
  url = 	 {https://proceedings.mlr.press/v130/bhandari21a.html},
  abstract = 	 { We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence. }
}

Endnote

%0 Conference Paper
%T  On the Linear Convergence of Policy Gradient Methods for Finite MDPs 
%A Jalaj Bhandari
%A Daniel Russo
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-bhandari21a
%I PMLR
%P 2386--2394
%U https://proceedings.mlr.press/v130/bhandari21a.html
%V 130
%X  We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence.

APA

Bhandari, J. & Russo, D.. (2021).  On the Linear Convergence of Policy Gradient Methods for Finite MDPs . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2386-2394 Available from https://proceedings.mlr.press/v130/bhandari21a.html.

On the Linear Convergence of Policy Gradient Methods for Finite MDPs

Abstract

Cite this Paper

Related Material