Imitating Task and Motion Planning with Visuomotor Transformers

Murtaza Dalal, Ajay Mandlekar, Caelan Reed Garrett, Ankur Handa, Ruslan Salakhutdinov, Dieter Fox
Proceedings of The 7th Conference on Robot Learning, PMLR 229:2565-2593, 2023.

Abstract

Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. We present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. We conduct a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving $70$ to $80%$ success rates. Video results and code at https://mihdalal.github.io/optimus/

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-dalal23a, title = {Imitating Task and Motion Planning with Visuomotor Transformers}, author = {Dalal, Murtaza and Mandlekar, Ajay and Garrett, Caelan Reed and Handa, Ankur and Salakhutdinov, Ruslan and Fox, Dieter}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {2565--2593}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/dalal23a/dalal23a.pdf}, url = {https://proceedings.mlr.press/v229/dalal23a.html}, abstract = {Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. We present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. We conduct a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving $70$ to $80%$ success rates. Video results and code at https://mihdalal.github.io/optimus/} }
Endnote
%0 Conference Paper %T Imitating Task and Motion Planning with Visuomotor Transformers %A Murtaza Dalal %A Ajay Mandlekar %A Caelan Reed Garrett %A Ankur Handa %A Ruslan Salakhutdinov %A Dieter Fox %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-dalal23a %I PMLR %P 2565--2593 %U https://proceedings.mlr.press/v229/dalal23a.html %V 229 %X Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. We present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. We conduct a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving $70$ to $80%$ success rates. Video results and code at https://mihdalal.github.io/optimus/
APA
Dalal, M., Mandlekar, A., Garrett, C.R., Handa, A., Salakhutdinov, R. & Fox, D.. (2023). Imitating Task and Motion Planning with Visuomotor Transformers. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:2565-2593 Available from https://proceedings.mlr.press/v229/dalal23a.html.

Related Material