A Divergence Minimization Perspective on Imitation Learning Methods

Seyed Kamyar Seyed Ghasemipour; Richard Zemel; Shixiang Gu

A Divergence Minimization Perspective on Imitation Learning Methods

Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shixiang Gu

Proceedings of the Conference on Robot Learning, PMLR 100:1259-1277, 2020.

Abstract

In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present f-MAX, an f-divergence generalization of AIRL [1], a state-of-the-art IRL method. f-MAX enables us to relate prior IRL methods such as GAIL [2] and AIRL [1], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches,and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL’s state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL method to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md.

Cite this Paper

BibTeX


@InProceedings{pmlr-v100-ghasemipour20a,
  title = 	 {A Divergence Minimization Perspective on Imitation Learning Methods},
  author =       {Ghasemipour, Seyed Kamyar Seyed and Zemel, Richard and Gu, Shixiang},
  booktitle = 	 {Proceedings of the Conference on Robot Learning},
  pages = 	 {1259--1277},
  year = 	 {2020},
  editor = 	 {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei},
  volume = 	 {100},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Oct--01 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v100/ghasemipour20a/ghasemipour20a.pdf},
  url = 	 {https://proceedings.mlr.press/v100/ghasemipour20a.html},
  abstract = 	 {In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present f-MAX, an f-divergence generalization of AIRL [1], a state-of-the-art IRL method. f-MAX enables us to relate prior IRL methods such as GAIL [2] and AIRL [1], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches,and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL’s state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL method to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md.}
}

Endnote

%0 Conference Paper
%T A Divergence Minimization Perspective on Imitation Learning Methods
%A Seyed Kamyar Seyed Ghasemipour
%A Richard Zemel
%A Shixiang Gu
%B Proceedings of the Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Leslie Pack Kaelbling
%E Danica Kragic
%E Komei Sugiura	
%F pmlr-v100-ghasemipour20a
%I PMLR
%P 1259--1277
%U https://proceedings.mlr.press/v100/ghasemipour20a.html
%V 100
%X In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present f-MAX, an f-divergence generalization of AIRL [1], a state-of-the-art IRL method. f-MAX enables us to relate prior IRL methods such as GAIL [2] and AIRL [1], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches,and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL’s state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL method to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md.

APA


Ghasemipour, S.K.S., Zemel, R. & Gu, S.. (2020). A Divergence Minimization Perspective on Imitation Learning Methods. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:1259-1277 Available from https://proceedings.mlr.press/v100/ghasemipour20a.html.

A Divergence Minimization Perspective on Imitation Learning Methods

Abstract

Cite this Paper

Related Material