Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Ha; Pete Florence; Shuran Song

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Ha, Pete Florence, Shuran Song

Proceedings of The 7th Conference on Robot Learning, PMLR 229:3766-3777, 2023.

Abstract

We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection procedure, while improving absolute success rates by $33.2%$ on average across five domains. Code, data, and additional qualitative results are available on https://www.cs.columbia.edu/ huy/scalingup/.

Cite this Paper

BibTeX

@InProceedings{pmlr-v229-ha23a,
  title = 	 {Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition},
  author =       {Ha, Huy and Florence, Pete and Song, Shuran},
  booktitle = 	 {Proceedings of The 7th Conference on Robot Learning},
  pages = 	 {3766--3777},
  year = 	 {2023},
  editor = 	 {Tan, Jie and Toussaint, Marc and Darvish, Kourosh},
  volume = 	 {229},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v229/ha23a/ha23a.pdf},
  url = 	 {https://proceedings.mlr.press/v229/ha23a.html},
  abstract = 	 {We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection procedure, while improving absolute success rates by $33.2%$ on average across five domains. Code, data, and additional qualitative results are available on https://www.cs.columbia.edu/ huy/scalingup/.}
}

Endnote

%0 Conference Paper
%T Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition
%A Huy Ha
%A Pete Florence
%A Shuran Song
%B Proceedings of The 7th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Jie Tan
%E Marc Toussaint
%E Kourosh Darvish	
%F pmlr-v229-ha23a
%I PMLR
%P 3766--3777
%U https://proceedings.mlr.press/v229/ha23a.html
%V 229
%X We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection procedure, while improving absolute success rates by $33.2%$ on average across five domains. Code, data, and additional qualitative results are available on https://www.cs.columbia.edu/ huy/scalingup/.

APA

Ha, H., Florence, P. & Song, S.. (2023). Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:3766-3777 Available from https://proceedings.mlr.press/v229/ha23a.html.

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Abstract

Cite this Paper

Related Material