Faster Policy Learning with Continuous-Time Gradients

Samuel Ainsworth; Kendall Lowrey; John Thickstun; Zaid Harchaoui; Siddhartha Srinivasa

Faster Policy Learning with Continuous-Time Gradients

Samuel Ainsworth, Kendall Lowrey, John Thickstun, Zaid Harchaoui, Siddhartha Srinivasa

Proceedings of the 3rd Conference on Learning for Dynamics and Control, PMLR 144:1054-1067, 2021.

Abstract

We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.

Cite this Paper

BibTeX


@InProceedings{pmlr-v144-ainsworth21a,
  title = 	 {Faster Policy Learning with Continuous-Time Gradients},
  author =       {Ainsworth, Samuel and Lowrey, Kendall and Thickstun, John and Harchaoui, Zaid and Srinivasa, Siddhartha},
  booktitle = 	 {Proceedings of the 3rd Conference on Learning for Dynamics and Control},
  pages = 	 {1054--1067},
  year = 	 {2021},
  editor = 	 {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.},
  volume = 	 {144},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07 -- 08 June},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v144/ainsworth21a/ainsworth21a.pdf},
  url = 	 {https://proceedings.mlr.press/v144/ainsworth21a.html},
  abstract = 	 {We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.}
}

Endnote

%0 Conference Paper
%T Faster Policy Learning with Continuous-Time Gradients
%A Samuel Ainsworth
%A Kendall Lowrey
%A John Thickstun
%A Zaid Harchaoui
%A Siddhartha Srinivasa
%B Proceedings of the 3rd Conference on Learning for Dynamics and Control
%C Proceedings of Machine Learning Research
%D 2021
%E Ali Jadbabaie
%E John Lygeros
%E George J. Pappas
%E Pablo A. Parrilo
%E Benjamin Recht
%E Claire J. Tomlin
%E Melanie N. Zeilinger	
%F pmlr-v144-ainsworth21a
%I PMLR
%P 1054--1067
%U https://proceedings.mlr.press/v144/ainsworth21a.html
%V 144
%X We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.

APA


Ainsworth, S., Lowrey, K., Thickstun, J., Harchaoui, Z. & Srinivasa, S.. (2021). Faster Policy Learning with Continuous-Time Gradients. Proceedings of the 3rd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 144:1054-1067 Available from https://proceedings.mlr.press/v144/ainsworth21a.html.

Related Material

Download PDF