Stochastic Variance-Reduced Policy Gradient

Matteo Papini; Damiano Binaghi; Giuseppe Canonaco; Matteo Pirotta; Marcello Restelli

Stochastic Variance-Reduced Policy Gradient

Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4026-4035, 2018.

Abstract

In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-papini18a,
  title = 	 {Stochastic Variance-Reduced Policy Gradient},
  author =       {Papini, Matteo and Binaghi, Damiano and Canonaco, Giuseppe and Pirotta, Matteo and Restelli, Marcello},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {4026--4035},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/papini18a/papini18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/papini18a.html},
  abstract = 	 {In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.}
}

Endnote

%0 Conference Paper
%T Stochastic Variance-Reduced Policy Gradient
%A Matteo Papini
%A Damiano Binaghi
%A Giuseppe Canonaco
%A Matteo Pirotta
%A Marcello Restelli
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-papini18a
%I PMLR
%P 4026--4035
%U https://proceedings.mlr.press/v80/papini18a.html
%V 80
%X In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

APA


Papini, M., Binaghi, D., Canonaco, G., Pirotta, M. & Restelli, M.. (2018). Stochastic Variance-Reduced Policy Gradient. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4026-4035 Available from https://proceedings.mlr.press/v80/papini18a.html.

Stochastic Variance-Reduced Policy Gradient

Abstract

Cite this Paper

Related Material