Model-Free Imitation Learning with Policy Optimization

Jonathan Ho; Jayesh Gupta; Stefano Ermon

Model-Free Imitation Learning with Policy Optimization

Jonathan Ho, Jayesh Gupta, Stefano Ermon

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2760-2769, 2016.

Abstract

In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-ho16,
  title = 	 {Model-Free Imitation Learning with Policy Optimization},
  author = 	 {Ho, Jonathan and Gupta, Jayesh and Ermon, Stefano},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {2760--2769},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/ho16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/ho16.html},
  abstract = 	 {In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.}
}

Endnote

%0 Conference Paper
%T Model-Free Imitation Learning with Policy Optimization
%A Jonathan Ho
%A Jayesh Gupta
%A Stefano Ermon
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-ho16
%I PMLR
%P 2760--2769
%U https://proceedings.mlr.press/v48/ho16.html
%V 48
%X In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.

RIS


TY  - CPAPER
TI  - Model-Free Imitation Learning with Policy Optimization
AU  - Jonathan Ho
AU  - Jayesh Gupta
AU  - Stefano Ermon
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-ho16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 2760
EP  - 2769
L1  - http://proceedings.mlr.press/v48/ho16.pdf
UR  - https://proceedings.mlr.press/v48/ho16.html
AB  - In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.
ER  -

APA


Ho, J., Gupta, J. & Ermon, S.. (2016). Model-Free Imitation Learning with Policy Optimization. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2760-2769 Available from https://proceedings.mlr.press/v48/ho16.html.

Model-Free Imitation Learning with Policy Optimization

Abstract

Cite this Paper

Related Material