Learning Complex Neural Network Policies with Trajectory Optimization

Sergey Levine; Vladlen Koltun

Learning Complex Neural Network Policies with Trajectory Optimization

Sergey Levine, Vladlen Koltun

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):829-837, 2014.

Abstract

Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-levine14,
  title = 	 {Learning Complex Neural Network Policies with Trajectory Optimization},
  author = 	 {Levine, Sergey and Koltun, Vladlen},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {829--837},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/levine14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/levine14.html},
  abstract = 	 {Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.}
}

Endnote

%0 Conference Paper
%T Learning Complex Neural Network Policies with Trajectory Optimization
%A Sergey Levine
%A Vladlen Koltun
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-levine14
%I PMLR
%P 829--837
%U https://proceedings.mlr.press/v32/levine14.html
%V 32
%N 2
%X Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.

RIS


TY  - CPAPER
TI  - Learning Complex Neural Network Policies with Trajectory Optimization
AU  - Sergey Levine
AU  - Vladlen Koltun
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-levine14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 829
EP  - 837
L1  - http://proceedings.mlr.press/v32/levine14.pdf
UR  - https://proceedings.mlr.press/v32/levine14.html
AB  - Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.
ER  -

APA


Levine, S. & Koltun, V.. (2014). Learning Complex Neural Network Policies with Trajectory Optimization. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):829-837 Available from https://proceedings.mlr.press/v32/levine14.html.

Learning Complex Neural Network Policies with Trajectory Optimization

Abstract

Cite this Paper

Related Material