Sample-Efficient Imitation Learning via Generative Adversarial Nets

Lionel Blondé, Alexandros Kalousis
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:3138-3148, 2019.

Abstract

GAIL is a recent successful imitation learning architecture that exploits the adversarial training procedure introduced in GANs. Albeit successful at generating behaviours similar to those demonstrated to the agent, GAIL suffers from a high sample complexity in the number of interactions it has to carry out in the environment in order to achieve satisfactory performance. We dramatically shrink the amount of interactions with the environment necessary to learn well-behaved imitation policies, by up to several orders of magnitude. Our framework, operating in the model-free regime, exhibits a significant increase in sample-efficiency over previous methods by simultaneously a) learning a self-tuned adversarially-trained surrogate reward and b) leveraging an off-policy actor-critic architecture. We show that our approach is simple to implement and that the learned agents remain remarkably stable, as shown in our experiments that span a variety of continuous control tasks. Video visualisations available at: \url{https://youtu.be/-nCsqUJnRKU}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-blonde19a, title = {Sample-Efficient Imitation Learning via Generative Adversarial Nets}, author = {Blond\'{e}, Lionel and Kalousis, Alexandros}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {3138--3148}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/blonde19a/blonde19a.pdf}, url = {https://proceedings.mlr.press/v89/blonde19a.html}, abstract = {GAIL is a recent successful imitation learning architecture that exploits the adversarial training procedure introduced in GANs. Albeit successful at generating behaviours similar to those demonstrated to the agent, GAIL suffers from a high sample complexity in the number of interactions it has to carry out in the environment in order to achieve satisfactory performance. We dramatically shrink the amount of interactions with the environment necessary to learn well-behaved imitation policies, by up to several orders of magnitude. Our framework, operating in the model-free regime, exhibits a significant increase in sample-efficiency over previous methods by simultaneously a) learning a self-tuned adversarially-trained surrogate reward and b) leveraging an off-policy actor-critic architecture. We show that our approach is simple to implement and that the learned agents remain remarkably stable, as shown in our experiments that span a variety of continuous control tasks. Video visualisations available at: \url{https://youtu.be/-nCsqUJnRKU}.} }
Endnote
%0 Conference Paper %T Sample-Efficient Imitation Learning via Generative Adversarial Nets %A Lionel Blondé %A Alexandros Kalousis %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-blonde19a %I PMLR %P 3138--3148 %U https://proceedings.mlr.press/v89/blonde19a.html %V 89 %X GAIL is a recent successful imitation learning architecture that exploits the adversarial training procedure introduced in GANs. Albeit successful at generating behaviours similar to those demonstrated to the agent, GAIL suffers from a high sample complexity in the number of interactions it has to carry out in the environment in order to achieve satisfactory performance. We dramatically shrink the amount of interactions with the environment necessary to learn well-behaved imitation policies, by up to several orders of magnitude. Our framework, operating in the model-free regime, exhibits a significant increase in sample-efficiency over previous methods by simultaneously a) learning a self-tuned adversarially-trained surrogate reward and b) leveraging an off-policy actor-critic architecture. We show that our approach is simple to implement and that the learned agents remain remarkably stable, as shown in our experiments that span a variety of continuous control tasks. Video visualisations available at: \url{https://youtu.be/-nCsqUJnRKU}.
APA
Blondé, L. & Kalousis, A.. (2019). Sample-Efficient Imitation Learning via Generative Adversarial Nets. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:3138-3148 Available from https://proceedings.mlr.press/v89/blonde19a.html.

Related Material