Faster Policy Learning with Continuous-Time Gradients

Samuel Ainsworth, Kendall Lowrey, John Thickstun, Zaid Harchaoui, Siddhartha Srinivasa
Proceedings of the 3rd Conference on Learning for Dynamics and Control, PMLR 144:1054-1067, 2021.

Abstract

We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.

Cite this Paper


BibTeX
@InProceedings{pmlr-v144-ainsworth21a, title = {Faster Policy Learning with Continuous-Time Gradients}, author = {Ainsworth, Samuel and Lowrey, Kendall and Thickstun, John and Harchaoui, Zaid and Srinivasa, Siddhartha}, booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control}, pages = {1054--1067}, year = {2021}, editor = {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.}, volume = {144}, series = {Proceedings of Machine Learning Research}, month = {07 -- 08 June}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v144/ainsworth21a/ainsworth21a.pdf}, url = {https://proceedings.mlr.press/v144/ainsworth21a.html}, abstract = {We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.} }
Endnote
%0 Conference Paper %T Faster Policy Learning with Continuous-Time Gradients %A Samuel Ainsworth %A Kendall Lowrey %A John Thickstun %A Zaid Harchaoui %A Siddhartha Srinivasa %B Proceedings of the 3rd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2021 %E Ali Jadbabaie %E John Lygeros %E George J. Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire J. Tomlin %E Melanie N. Zeilinger %F pmlr-v144-ainsworth21a %I PMLR %P 1054--1067 %U https://proceedings.mlr.press/v144/ainsworth21a.html %V 144 %X We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.
APA
Ainsworth, S., Lowrey, K., Thickstun, J., Harchaoui, Z. & Srinivasa, S.. (2021). Faster Policy Learning with Continuous-Time Gradients. Proceedings of the 3rd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 144:1054-1067 Available from https://proceedings.mlr.press/v144/ainsworth21a.html.

Related Material