Adversarial Option-Aware Hierarchical Imitation Learning

Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei Li
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5097-5106, 2021.

Abstract

It has been a challenge to learning skills for an agent from long-horizon unannotated demonstrations. Existing approaches like Hierarchical Imitation Learning(HIL) are prone to compounding errors or suboptimal solutions. In this paper, we propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent. We theoretically prove the convergence of the proposed algorithm. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-jing21a, title = {Adversarial Option-Aware Hierarchical Imitation Learning}, author = {Jing, Mingxuan and Huang, Wenbing and Sun, Fuchun and Ma, Xiaojian and Kong, Tao and Gan, Chuang and Li, Lei}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {5097--5106}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/jing21a/jing21a.pdf}, url = {https://proceedings.mlr.press/v139/jing21a.html}, abstract = {It has been a challenge to learning skills for an agent from long-horizon unannotated demonstrations. Existing approaches like Hierarchical Imitation Learning(HIL) are prone to compounding errors or suboptimal solutions. In this paper, we propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent. We theoretically prove the convergence of the proposed algorithm. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.} }
Endnote
%0 Conference Paper %T Adversarial Option-Aware Hierarchical Imitation Learning %A Mingxuan Jing %A Wenbing Huang %A Fuchun Sun %A Xiaojian Ma %A Tao Kong %A Chuang Gan %A Lei Li %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-jing21a %I PMLR %P 5097--5106 %U https://proceedings.mlr.press/v139/jing21a.html %V 139 %X It has been a challenge to learning skills for an agent from long-horizon unannotated demonstrations. Existing approaches like Hierarchical Imitation Learning(HIL) are prone to compounding errors or suboptimal solutions. In this paper, we propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent. We theoretically prove the convergence of the proposed algorithm. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
APA
Jing, M., Huang, W., Sun, F., Ma, X., Kong, T., Gan, C. & Li, L.. (2021). Adversarial Option-Aware Hierarchical Imitation Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5097-5106 Available from https://proceedings.mlr.press/v139/jing21a.html.

Related Material