Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization

Youngwoon Lee, Joseph J Lim, Anima Anandkumar, Yuke Zhu
Proceedings of the 5th Conference on Robot Learning, PMLR 164:406-416, 2022.

Abstract

Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy’s starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at https://clvrai.com/skill-chaining.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-lee22a, title = {Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization}, author = {Lee, Youngwoon and Lim, Joseph J and Anandkumar, Anima and Zhu, Yuke}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {406--416}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/lee22a/lee22a.pdf}, url = {https://proceedings.mlr.press/v164/lee22a.html}, abstract = {Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy’s starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at https://clvrai.com/skill-chaining.} }
Endnote
%0 Conference Paper %T Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization %A Youngwoon Lee %A Joseph J Lim %A Anima Anandkumar %A Yuke Zhu %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-lee22a %I PMLR %P 406--416 %U https://proceedings.mlr.press/v164/lee22a.html %V 164 %X Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy’s starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at https://clvrai.com/skill-chaining.
APA
Lee, Y., Lim, J.J., Anandkumar, A. & Zhu, Y.. (2022). Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:406-416 Available from https://proceedings.mlr.press/v164/lee22a.html.

Related Material