Imitating Latent Policies from Observation

Ashley Edwards; Himanshu Sahni; Yannick Schroecker; Charles Isbell

Imitating Latent Policies from Observation

Ashley Edwards, Himanshu Sahni, Yannick Schroecker, Charles Isbell

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1755-1763, 2019.

Abstract

In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations. We introduce a method that characterizes the causal effects of latent actions on observations while simultaneously predicting their likelihood. We then outline an action alignment procedure that leverages a small amount of environment interactions to determine a mapping between the latent and real-world actions. We show that this corrected labeling can be used for imitating the observed behavior, even though no expert actions are given. We evaluate our approach within classic control environments and a platform game and demonstrate that it performs better than standard approaches. Code for this work is available at https://github.com/ashedwards/ILPO.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-edwards19a,
  title = 	 {Imitating Latent Policies from Observation},
  author =       {Edwards, Ashley and Sahni, Himanshu and Schroecker, Yannick and Isbell, Charles},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {1755--1763},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/edwards19a/edwards19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/edwards19a.html},
  abstract = 	 {In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations. We introduce a method that characterizes the causal effects of latent actions on observations while simultaneously predicting their likelihood. We then outline an action alignment procedure that leverages a small amount of environment interactions to determine a mapping between the latent and real-world actions. We show that this corrected labeling can be used for imitating the observed behavior, even though no expert actions are given. We evaluate our approach within classic control environments and a platform game and demonstrate that it performs better than standard approaches. Code for this work is available at https://github.com/ashedwards/ILPO.}
}

Endnote

%0 Conference Paper
%T Imitating Latent Policies from Observation
%A Ashley Edwards
%A Himanshu Sahni
%A Yannick Schroecker
%A Charles Isbell
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-edwards19a
%I PMLR
%P 1755--1763
%U https://proceedings.mlr.press/v97/edwards19a.html
%V 97
%X In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations. We introduce a method that characterizes the causal effects of latent actions on observations while simultaneously predicting their likelihood. We then outline an action alignment procedure that leverages a small amount of environment interactions to determine a mapping between the latent and real-world actions. We show that this corrected labeling can be used for imitating the observed behavior, even though no expert actions are given. We evaluate our approach within classic control environments and a platform game and demonstrate that it performs better than standard approaches. Code for this work is available at https://github.com/ashedwards/ILPO.

APA

Edwards, A., Sahni, H., Schroecker, Y. & Isbell, C.. (2019). Imitating Latent Policies from Observation. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1755-1763 Available from https://proceedings.mlr.press/v97/edwards19a.html.

Imitating Latent Policies from Observation

Abstract

Cite this Paper

Related Material