Task-Relevant Adversarial Imitation Learning

Konrad Zolna; Scott Reed; Alexander Novikov; Sergio Gómez Colmenarejo; David Budden; Serkan Cabi; Misha Denil; Nando de Freitas; Ziyu Wang

Task-Relevant Adversarial Imitation Learning

Konrad Zolna, Scott Reed, Alexander Novikov, Sergio Gómez Colmenarejo, David Budden, Serkan Cabi, Misha Denil, Nando de Freitas, Ziyu Wang

Proceedings of the 2020 Conference on Robot Learning, PMLR 155:247-263, 2021.

Abstract

We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms standard Generative Adversarial Imitation Learning (GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning (TRAIL), uses constrained discriminator optimization to learn informative rewards. In comprehensive experiments, we show that TRAIL can solve challenging robotic manipulation tasks from pixels by imitating human operators without access to any task rewards, and clearly outperforms comparable baseline imitation agents, including those trained via behaviour cloning and conventional GAIL.

Cite this Paper

BibTeX


@InProceedings{pmlr-v155-zolna21a,
  title = 	 {Task-Relevant Adversarial Imitation Learning},
  author =       {Zolna, Konrad and Reed, Scott and Novikov, Alexander and Colmenarejo, Sergio G\'{o}mez and Budden, David and Cabi, Serkan and Denil, Misha and Freitas, Nando de and Wang, Ziyu},
  booktitle = 	 {Proceedings of the 2020 Conference on Robot Learning},
  pages = 	 {247--263},
  year = 	 {2021},
  editor = 	 {Kober, Jens and Ramos, Fabio and Tomlin, Claire},
  volume = 	 {155},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v155/zolna21a/zolna21a.pdf},
  url = 	 {https://proceedings.mlr.press/v155/zolna21a.html},
  abstract = 	 {We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms standard Generative Adversarial Imitation Learning (GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning (TRAIL), uses constrained discriminator optimization to learn informative rewards. In comprehensive experiments, we show that TRAIL can solve challenging robotic manipulation tasks from pixels by imitating human operators without access to any task rewards, and clearly outperforms comparable baseline imitation agents, including those trained via behaviour cloning and conventional GAIL.}
}

Endnote

%0 Conference Paper
%T Task-Relevant Adversarial Imitation Learning
%A Konrad Zolna
%A Scott Reed
%A Alexander Novikov
%A Sergio Gómez Colmenarejo
%A David Budden
%A Serkan Cabi
%A Misha Denil
%A Nando de Freitas
%A Ziyu Wang
%B Proceedings of the 2020 Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Jens Kober
%E Fabio Ramos
%E Claire Tomlin	
%F pmlr-v155-zolna21a
%I PMLR
%P 247--263
%U https://proceedings.mlr.press/v155/zolna21a.html
%V 155
%X We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms standard Generative Adversarial Imitation Learning (GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning (TRAIL), uses constrained discriminator optimization to learn informative rewards. In comprehensive experiments, we show that TRAIL can solve challenging robotic manipulation tasks from pixels by imitating human operators without access to any task rewards, and clearly outperforms comparable baseline imitation agents, including those trained via behaviour cloning and conventional GAIL.

APA


Zolna, K., Reed, S., Novikov, A., Colmenarejo, S.G., Budden, D., Cabi, S., Denil, M., Freitas, N.d. & Wang, Z.. (2021). Task-Relevant Adversarial Imitation Learning. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:247-263 Available from https://proceedings.mlr.press/v155/zolna21a.html.

Related Material

Download PDF