PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

Garrett Thomas; Ching-An Cheng; Ricky Loynd; Felipe Vieira Frujeri; Vibhav Vineet; Mihai Jalobeanu; Andrey Kolobov

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov

Proceedings of The 7th Conference on Robot Learning, PMLR 229:2624-2641, 2023.

Abstract

A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos – a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX’s generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX’s transformers greatly helps in low-data regimes of learning from human-collected demonstrations.

Cite this Paper

BibTeX


@InProceedings{pmlr-v229-thomas23a,
  title = 	 {PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining},
  author =       {Thomas, Garrett and Cheng, Ching-An and Loynd, Ricky and Frujeri, Felipe Vieira and Vineet, Vibhav and Jalobeanu, Mihai and Kolobov, Andrey},
  booktitle = 	 {Proceedings of The 7th Conference on Robot Learning},
  pages = 	 {2624--2641},
  year = 	 {2023},
  editor = 	 {Tan, Jie and Toussaint, Marc and Darvish, Kourosh},
  volume = 	 {229},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v229/thomas23a/thomas23a.pdf},
  url = 	 {https://proceedings.mlr.press/v229/thomas23a.html},
  abstract = 	 {A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos – a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX’s generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX’s transformers greatly helps in low-data regimes of learning from human-collected demonstrations.}
}

Endnote

%0 Conference Paper
%T PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining
%A Garrett Thomas
%A Ching-An Cheng
%A Ricky Loynd
%A Felipe Vieira Frujeri
%A Vibhav Vineet
%A Mihai Jalobeanu
%A Andrey Kolobov
%B Proceedings of The 7th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Jie Tan
%E Marc Toussaint
%E Kourosh Darvish	
%F pmlr-v229-thomas23a
%I PMLR
%P 2624--2641
%U https://proceedings.mlr.press/v229/thomas23a.html
%V 229
%X A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos – a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX’s generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX’s transformers greatly helps in low-data regimes of learning from human-collected demonstrations.

APA


Thomas, G., Cheng, C., Loynd, R., Frujeri, F.V., Vineet, V., Jalobeanu, M. & Kolobov, A.. (2023). PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:2624-2641 Available from https://proceedings.mlr.press/v229/thomas23a.html.

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

Abstract

Cite this Paper

Related Material