Model-Free Imitation Learning with Policy Optimization

Jonathan Ho, Jayesh Gupta, Stefano Ermon
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2760-2769, 2016.

Abstract

In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-ho16, title = {Model-Free Imitation Learning with Policy Optimization}, author = {Ho, Jonathan and Gupta, Jayesh and Ermon, Stefano}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {2760--2769}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/ho16.pdf}, url = {https://proceedings.mlr.press/v48/ho16.html}, abstract = {In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.} }
Endnote
%0 Conference Paper %T Model-Free Imitation Learning with Policy Optimization %A Jonathan Ho %A Jayesh Gupta %A Stefano Ermon %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-ho16 %I PMLR %P 2760--2769 %U https://proceedings.mlr.press/v48/ho16.html %V 48 %X In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.
RIS
TY - CPAPER TI - Model-Free Imitation Learning with Policy Optimization AU - Jonathan Ho AU - Jayesh Gupta AU - Stefano Ermon BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-ho16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 2760 EP - 2769 L1 - http://proceedings.mlr.press/v48/ho16.pdf UR - https://proceedings.mlr.press/v48/ho16.html AB - In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima. ER -
APA
Ho, J., Gupta, J. & Ermon, S.. (2016). Model-Free Imitation Learning with Policy Optimization. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2760-2769 Available from https://proceedings.mlr.press/v48/ho16.html.

Related Material