An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

Pan Xu; Felicia Gao; Quanquan Gu

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

Pan Xu, Felicia Gao, Quanquan Gu

Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:541-551, 2020.

Abstract

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by \citet{papini2018stochastic} for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate stationary point of the performance function within $O(1/\epsilon^{5/3})$ trajectories. This sample complexity improves upon the best known result $O(1/\epsilon^2)$ by a factor of $O(1/\epsilon^{1/3})$. At the core of our analysis is (i) a tighter upper bound for the variance of importance sampling weights, where we prove that the variance can be controlled by the parameter distance between different policies; and (ii) a fine-grained analysis of the epoch length and batch size parameters such that we can significantly reduce the number of trajectories required in each iteration of SVRPG. We also empirically demonstrate the effectiveness of our theoretical claims of batch sizes on reinforcement learning benchmark tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v115-xu20a,
  title = 	 {An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient},
  author =       {Xu, Pan and Gao, Felicia and Gu, Quanquan},
  booktitle = 	 {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference},
  pages = 	 {541--551},
  year = 	 {2020},
  editor = 	 {Adams, Ryan P. and Gogate, Vibhav},
  volume = 	 {115},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {22--25 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v115/xu20a/xu20a.pdf},
  url = 	 {https://proceedings.mlr.press/v115/xu20a.html},
  abstract = 	 {We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by \citet{papini2018stochastic} for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate stationary point of the performance function within $O(1/\epsilon^{5/3})$ trajectories. This sample complexity improves upon the best known result $O(1/\epsilon^2)$ by a factor of $O(1/\epsilon^{1/3})$. At the core of our analysis is (i) a tighter upper bound for the variance of importance sampling weights, where we prove that the variance can be controlled by the parameter distance between different policies; and (ii) a fine-grained analysis of the epoch length and batch size parameters such that we can significantly reduce the number of trajectories required in each iteration of SVRPG. We also empirically demonstrate the effectiveness of our theoretical claims of batch sizes on  reinforcement learning benchmark tasks. }
}

Endnote

%0 Conference Paper
%T An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
%A Pan Xu
%A Felicia Gao
%A Quanquan Gu
%B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference
%C Proceedings of Machine Learning Research
%D 2020
%E Ryan P. Adams
%E Vibhav Gogate	
%F pmlr-v115-xu20a
%I PMLR
%P 541--551
%U https://proceedings.mlr.press/v115/xu20a.html
%V 115
%X We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by \citet{papini2018stochastic} for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate stationary point of the performance function within $O(1/\epsilon^{5/3})$ trajectories. This sample complexity improves upon the best known result $O(1/\epsilon^2)$ by a factor of $O(1/\epsilon^{1/3})$. At the core of our analysis is (i) a tighter upper bound for the variance of importance sampling weights, where we prove that the variance can be controlled by the parameter distance between different policies; and (ii) a fine-grained analysis of the epoch length and batch size parameters such that we can significantly reduce the number of trajectories required in each iteration of SVRPG. We also empirically demonstrate the effectiveness of our theoretical claims of batch sizes on  reinforcement learning benchmark tasks.

APA


Xu, P., Gao, F. & Gu, Q.. (2020). An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:541-551 Available from https://proceedings.mlr.press/v115/xu20a.html.

Related Material

Download PDF