Policies Modulating Trajectory Generators

Atil Iscen; Ken Caluwaerts; Jie Tan; Tingnan Zhang; Erwin Coumans; Vikas Sindhwani; Vincent Vanhoucke

Policies Modulating Trajectory Generators

Atil Iscen, Ken Caluwaerts, Jie Tan, Tingnan Zhang, Erwin Coumans, Vikas Sindhwani, Vincent Vanhoucke

Proceedings of The 2nd Conference on Robot Learning, PMLR 87:916-926, 2018.

Abstract

We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity.

Cite this Paper

BibTeX

@InProceedings{pmlr-v87-iscen18a,
  title = 	 {Policies Modulating Trajectory Generators},
  author =       {Iscen, Atil and Caluwaerts, Ken and Tan, Jie and Zhang, Tingnan and Coumans, Erwin and Sindhwani, Vikas and Vanhoucke, Vincent},
  booktitle = 	 {Proceedings of The 2nd Conference on Robot Learning},
  pages = 	 {916--926},
  year = 	 {2018},
  editor = 	 {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun},
  volume = 	 {87},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--31 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v87/iscen18a/iscen18a.pdf},
  url = 	 {https://proceedings.mlr.press/v87/iscen18a.html},
  abstract = 	 {We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity. }
}

Endnote

%0 Conference Paper
%T Policies Modulating Trajectory Generators
%A Atil Iscen
%A Ken Caluwaerts
%A Jie Tan
%A Tingnan Zhang
%A Erwin Coumans
%A Vikas Sindhwani
%A Vincent Vanhoucke
%B Proceedings of The 2nd Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Aude Billard
%E Anca Dragan
%E Jan Peters
%E Jun Morimoto	
%F pmlr-v87-iscen18a
%I PMLR
%P 916--926
%U https://proceedings.mlr.press/v87/iscen18a.html
%V 87
%X We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity.

APA

Iscen, A., Caluwaerts, K., Tan, J., Zhang, T., Coumans, E., Sindhwani, V. & Vanhoucke, V.. (2018). Policies Modulating Trajectory Generators. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:916-926 Available from https://proceedings.mlr.press/v87/iscen18a.html.

Policies Modulating Trajectory Generators

Abstract

Cite this Paper

Related Material