VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Yifeng Zhu, Abhishek Joshi, Peter Stone, Yuke Zhu
Proceedings of The 6th Conference on Robot Learning, PMLR 205:1199-1210, 2023.

Abstract

We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm’s robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website: https://ut-austin-rpl.github.io/VIOLA/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-zhu23a, title = {VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors}, author = {Zhu, Yifeng and Joshi, Abhishek and Stone, Peter and Zhu, Yuke}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {1199--1210}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/zhu23a/zhu23a.pdf}, url = {https://proceedings.mlr.press/v205/zhu23a.html}, abstract = {We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm’s robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website: https://ut-austin-rpl.github.io/VIOLA/.} }
Endnote
%0 Conference Paper %T VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors %A Yifeng Zhu %A Abhishek Joshi %A Peter Stone %A Yuke Zhu %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-zhu23a %I PMLR %P 1199--1210 %U https://proceedings.mlr.press/v205/zhu23a.html %V 205 %X We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm’s robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website: https://ut-austin-rpl.github.io/VIOLA/.
APA
Zhu, Y., Joshi, A., Stone, P. & Zhu, Y.. (2023). VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1199-1210 Available from https://proceedings.mlr.press/v205/zhu23a.html.

Related Material