Augmenting GAIL with BC for sample efficient imitation learning

Rohit Jena; Changliu Liu; Katia Sycara

Augmenting GAIL with BC for sample efficient imitation learning

Rohit Jena, Changliu Liu, Katia Sycara

Proceedings of the 2020 Conference on Robot Learning, PMLR 155:80-90, 2021.

Abstract

Imitation learning is the problem of recovering an expert policy without access to a reward signal. Behavior cloning and GAIL are two widely used methods for performing imitation learning. Behavior cloning converges in a few iterations, but doesn’t achieve peak performance due to its inherent iid assumption about the state-action distribution. GAIL addresses the issue by accounting for the temporal dependencies when performing a state distribution matching between the agent and the expert. Although GAIL is sample efficient in the number of expert trajectories required, it is still not very sample efficient in terms of the environment interactions needed for convergence of the policy. Given the complementary benefits of both methods, we present a simple and elegant method to combine both methods to enable stable and sample efficient learning. Our algorithm is very simple to implement and integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm in low dimensional control tasks, gridworlds and in high dimensional image-based tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v155-jena21a,
  title = 	 {Augmenting GAIL with BC for sample efficient imitation learning},
  author =       {Jena, Rohit and Liu, Changliu and Sycara, Katia},
  booktitle = 	 {Proceedings of the 2020 Conference on Robot Learning},
  pages = 	 {80--90},
  year = 	 {2021},
  editor = 	 {Kober, Jens and Ramos, Fabio and Tomlin, Claire},
  volume = 	 {155},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v155/jena21a/jena21a.pdf},
  url = 	 {https://proceedings.mlr.press/v155/jena21a.html},
  abstract = 	 {Imitation learning is the problem of recovering an expert policy without access to a reward signal. Behavior cloning and GAIL are two widely used methods for performing imitation learning. Behavior cloning converges in a few iterations, but doesn’t achieve peak performance due to its inherent iid assumption about the state-action distribution. GAIL addresses the issue by accounting for the temporal dependencies when performing a state distribution matching between the agent and the expert. Although GAIL is sample efficient in the number of expert trajectories required, it is still not very sample efficient in terms of the environment interactions needed for convergence of the policy. Given the complementary benefits of both methods, we present a simple and elegant method to combine both methods to enable stable and sample efficient learning. Our algorithm is very simple to implement and integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm in low dimensional control tasks, gridworlds and in high dimensional image-based tasks.}
}

Endnote

%0 Conference Paper
%T Augmenting GAIL with BC for sample efficient imitation learning
%A Rohit Jena
%A Changliu Liu
%A Katia Sycara
%B Proceedings of the 2020 Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Jens Kober
%E Fabio Ramos
%E Claire Tomlin	
%F pmlr-v155-jena21a
%I PMLR
%P 80--90
%U https://proceedings.mlr.press/v155/jena21a.html
%V 155
%X Imitation learning is the problem of recovering an expert policy without access to a reward signal. Behavior cloning and GAIL are two widely used methods for performing imitation learning. Behavior cloning converges in a few iterations, but doesn’t achieve peak performance due to its inherent iid assumption about the state-action distribution. GAIL addresses the issue by accounting for the temporal dependencies when performing a state distribution matching between the agent and the expert. Although GAIL is sample efficient in the number of expert trajectories required, it is still not very sample efficient in terms of the environment interactions needed for convergence of the policy. Given the complementary benefits of both methods, we present a simple and elegant method to combine both methods to enable stable and sample efficient learning. Our algorithm is very simple to implement and integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm in low dimensional control tasks, gridworlds and in high dimensional image-based tasks.

APA


Jena, R., Liu, C. & Sycara, K.. (2021). Augmenting GAIL with BC for sample efficient imitation learning. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:80-90 Available from https://proceedings.mlr.press/v155/jena21a.html.

Related Material

Download PDF