Adversarial Option-Aware Hierarchical Imitation Learning

Mingxuan Jing; Wenbing Huang; Fuchun Sun; Xiaojian Ma; Tao Kong; Chuang Gan; Lei Li

Adversarial Option-Aware Hierarchical Imitation Learning

Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei Li

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5097-5106, 2021.

Abstract

It has been a challenge to learning skills for an agent from long-horizon unannotated demonstrations. Existing approaches like Hierarchical Imitation Learning(HIL) are prone to compounding errors or suboptimal solutions. In this paper, we propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent. We theoretically prove the convergence of the proposed algorithm. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-jing21a,
  title = 	 {Adversarial Option-Aware Hierarchical Imitation Learning},
  author =       {Jing, Mingxuan and Huang, Wenbing and Sun, Fuchun and Ma, Xiaojian and Kong, Tao and Gan, Chuang and Li, Lei},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {5097--5106},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/jing21a/jing21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/jing21a.html},
  abstract = 	 {It has been a challenge to learning skills for an agent from long-horizon unannotated demonstrations. Existing approaches like Hierarchical Imitation Learning(HIL) are prone to compounding errors or suboptimal solutions. In this paper, we propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent. We theoretically prove the convergence of the proposed algorithm. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.}
}

Endnote

%0 Conference Paper
%T Adversarial Option-Aware Hierarchical Imitation Learning
%A Mingxuan Jing
%A Wenbing Huang
%A Fuchun Sun
%A Xiaojian Ma
%A Tao Kong
%A Chuang Gan
%A Lei Li
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-jing21a
%I PMLR
%P 5097--5106
%U https://proceedings.mlr.press/v139/jing21a.html
%V 139
%X It has been a challenge to learning skills for an agent from long-horizon unannotated demonstrations. Existing approaches like Hierarchical Imitation Learning(HIL) are prone to compounding errors or suboptimal solutions. In this paper, we propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent. We theoretically prove the convergence of the proposed algorithm. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.

APA

Jing, M., Huang, W., Sun, F., Ma, X., Kong, T., Gan, C. & Li, L.. (2021). Adversarial Option-Aware Hierarchical Imitation Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5097-5106 Available from https://proceedings.mlr.press/v139/jing21a.html.

Adversarial Option-Aware Hierarchical Imitation Learning

Abstract

Cite this Paper

Related Material