Guided Policy Search

Sergey Levine; Vladlen Koltun

Guided Policy Search

Sergey Levine, Vladlen Koltun

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1-9, 2013.

Abstract

Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynamic programming can be used to generate suitable guiding samples, and describe a regularized importance sampled policy optimization that incorporates these samples into the policy search. We evaluate the method by learning neural network controllers for planar swimming, hopping, and walking, as well as simulated 3D humanoid running.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-levine13,
  title = 	 {Guided Policy Search},
  author = 	 {Levine, Sergey and Koltun, Vladlen},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {1--9},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/levine13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/levine13.html},
  abstract = 	 {Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynamic programming can be used to generate suitable guiding samples, and describe a regularized importance sampled policy optimization that incorporates these samples into the policy search. We evaluate the method by learning neural network controllers for planar swimming, hopping, and walking, as well as simulated 3D humanoid running.}
}

Endnote

%0 Conference Paper
%T Guided Policy Search
%A Sergey Levine
%A Vladlen Koltun
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-levine13
%I PMLR
%P 1--9
%U https://proceedings.mlr.press/v28/levine13.html
%V 28
%N 3
%X Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynamic programming can be used to generate suitable guiding samples, and describe a regularized importance sampled policy optimization that incorporates these samples into the policy search. We evaluate the method by learning neural network controllers for planar swimming, hopping, and walking, as well as simulated 3D humanoid running.

RIS


TY  - CPAPER
TI  - Guided Policy Search
AU  - Sergey Levine
AU  - Vladlen Koltun
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-levine13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 1
EP  - 9
L1  - http://proceedings.mlr.press/v28/levine13.pdf
UR  - https://proceedings.mlr.press/v28/levine13.html
AB  - Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynamic programming can be used to generate suitable guiding samples, and describe a regularized importance sampled policy optimization that incorporates these samples into the policy search. We evaluate the method by learning neural network controllers for planar swimming, hopping, and walking, as well as simulated 3D humanoid running.
ER  -

APA


Levine, S. & Koltun, V.. (2013). Guided Policy Search. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1-9 Available from https://proceedings.mlr.press/v28/levine13.html.

Guided Policy Search

Abstract

Cite this Paper

Related Material