PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov
Proceedings of The 7th Conference on Robot Learning, PMLR 229:2624-2641, 2023.

Abstract

A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos – a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX’s generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX’s transformers greatly helps in low-data regimes of learning from human-collected demonstrations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-thomas23a, title = {PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining}, author = {Thomas, Garrett and Cheng, Ching-An and Loynd, Ricky and Frujeri, Felipe Vieira and Vineet, Vibhav and Jalobeanu, Mihai and Kolobov, Andrey}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {2624--2641}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/thomas23a/thomas23a.pdf}, url = {https://proceedings.mlr.press/v229/thomas23a.html}, abstract = {A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos – a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX’s generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX’s transformers greatly helps in low-data regimes of learning from human-collected demonstrations.} }
Endnote
%0 Conference Paper %T PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining %A Garrett Thomas %A Ching-An Cheng %A Ricky Loynd %A Felipe Vieira Frujeri %A Vibhav Vineet %A Mihai Jalobeanu %A Andrey Kolobov %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-thomas23a %I PMLR %P 2624--2641 %U https://proceedings.mlr.press/v229/thomas23a.html %V 229 %X A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos – a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX’s generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX’s transformers greatly helps in low-data regimes of learning from human-collected demonstrations.
APA
Thomas, G., Cheng, C., Loynd, R., Frujeri, F.V., Vineet, V., Jalobeanu, M. & Kolobov, A.. (2023). PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:2624-2641 Available from https://proceedings.mlr.press/v229/thomas23a.html.

Related Material