Policies Modulating Trajectory Generators

Atil Iscen, Ken Caluwaerts, Jie Tan, Tingnan Zhang, Erwin Coumans, Vikas Sindhwani, Vincent Vanhoucke
; Proceedings of The 2nd Conference on Robot Learning, PMLR 87:916-926, 2018.

Abstract

We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v87-iscen18a, title = {Policies Modulating Trajectory Generators}, author = {Iscen, Atil and Caluwaerts, Ken and Tan, Jie and Zhang, Tingnan and Coumans, Erwin and Sindhwani, Vikas and Vanhoucke, Vincent}, booktitle = {Proceedings of The 2nd Conference on Robot Learning}, pages = {916--926}, year = {2018}, editor = {Aude Billard and Anca Dragan and Jan Peters and Jun Morimoto}, volume = {87}, series = {Proceedings of Machine Learning Research}, address = {}, month = {29--31 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v87/iscen18a/iscen18a.pdf}, url = {http://proceedings.mlr.press/v87/iscen18a.html}, abstract = {We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity. } }
Endnote
%0 Conference Paper %T Policies Modulating Trajectory Generators %A Atil Iscen %A Ken Caluwaerts %A Jie Tan %A Tingnan Zhang %A Erwin Coumans %A Vikas Sindhwani %A Vincent Vanhoucke %B Proceedings of The 2nd Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2018 %E Aude Billard %E Anca Dragan %E Jan Peters %E Jun Morimoto %F pmlr-v87-iscen18a %I PMLR %J Proceedings of Machine Learning Research %P 916--926 %U http://proceedings.mlr.press %V 87 %W PMLR %X We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity.
APA
Iscen, A., Caluwaerts, K., Tan, J., Zhang, T., Coumans, E., Sindhwani, V. & Vanhoucke, V.. (2018). Policies Modulating Trajectory Generators. Proceedings of The 2nd Conference on Robot Learning, in PMLR 87:916-926

Related Material