End-to-End Differentiable Adversarial Imitation Learning

Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:390-399, 2017.

Abstract

Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup. However, the computation graph of GANs, that include a stochastic policy as the generative model, is no longer differentiable end-to-end, which requires the use of high-variance gradient estimation. In this paper, we introduce the Model-based Generative Adversarial Imitation Learning (MGAIL) algorithm. We show how to use a forward model to make the computation fully differentiable, which enables training policies using the exact gradient of the discriminator. The resulting algorithm trains competent policies using relatively fewer expert samples and interactions with the environment. We test it on both discrete and continuous action domains and report results that surpass the state-of-the-art.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-baram17a, title = {End-to-End Differentiable Adversarial Imitation Learning}, author = {Nir Baram and Oron Anschel and Itai Caspi and Shie Mannor}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {390--399}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/baram17a/baram17a.pdf}, url = {https://proceedings.mlr.press/v70/baram17a.html}, abstract = {Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup. However, the computation graph of GANs, that include a stochastic policy as the generative model, is no longer differentiable end-to-end, which requires the use of high-variance gradient estimation. In this paper, we introduce the Model-based Generative Adversarial Imitation Learning (MGAIL) algorithm. We show how to use a forward model to make the computation fully differentiable, which enables training policies using the exact gradient of the discriminator. The resulting algorithm trains competent policies using relatively fewer expert samples and interactions with the environment. We test it on both discrete and continuous action domains and report results that surpass the state-of-the-art.} }
Endnote
%0 Conference Paper %T End-to-End Differentiable Adversarial Imitation Learning %A Nir Baram %A Oron Anschel %A Itai Caspi %A Shie Mannor %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-baram17a %I PMLR %P 390--399 %U https://proceedings.mlr.press/v70/baram17a.html %V 70 %X Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup. However, the computation graph of GANs, that include a stochastic policy as the generative model, is no longer differentiable end-to-end, which requires the use of high-variance gradient estimation. In this paper, we introduce the Model-based Generative Adversarial Imitation Learning (MGAIL) algorithm. We show how to use a forward model to make the computation fully differentiable, which enables training policies using the exact gradient of the discriminator. The resulting algorithm trains competent policies using relatively fewer expert samples and interactions with the environment. We test it on both discrete and continuous action domains and report results that surpass the state-of-the-art.
APA
Baram, N., Anschel, O., Caspi, I. & Mannor, S.. (2017). End-to-End Differentiable Adversarial Imitation Learning. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:390-399 Available from https://proceedings.mlr.press/v70/baram17a.html.

Related Material