Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Chelsea Finn; Sergey Levine; Pieter Abbeel

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Chelsea Finn, Sergey Levine, Pieter Abbeel

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:49-58, 2016.

Abstract

Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-finn16,
  title = 	 {Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization},
  author = 	 {Finn, Chelsea and Levine, Sergey and Abbeel, Pieter},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {49--58},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/finn16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/finn16.html},
  abstract = 	 {Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.}
}

Endnote

%0 Conference Paper
%T Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
%A Chelsea Finn
%A Sergey Levine
%A Pieter Abbeel
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-finn16
%I PMLR
%P 49--58
%U https://proceedings.mlr.press/v48/finn16.html
%V 48
%X Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.

RIS


TY  - CPAPER
TI  - Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
AU  - Chelsea Finn
AU  - Sergey Levine
AU  - Pieter Abbeel
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-finn16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 49
EP  - 58
L1  - http://proceedings.mlr.press/v48/finn16.pdf
UR  - https://proceedings.mlr.press/v48/finn16.html
AB  - Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.
ER  -

APA


Finn, C., Levine, S. & Abbeel, P.. (2016). Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:49-58 Available from https://proceedings.mlr.press/v48/finn16.html.

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Abstract

Cite this Paper

Related Material