VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Yifeng Zhu; Abhishek Joshi; Peter Stone; Yuke Zhu

VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Yifeng Zhu, Abhishek Joshi, Peter Stone, Yuke Zhu

Proceedings of The 6th Conference on Robot Learning, PMLR 205:1199-1210, 2023.

Abstract

We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm’s robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website: https://ut-austin-rpl.github.io/VIOLA/.

Cite this Paper

BibTeX


@InProceedings{pmlr-v205-zhu23a,
  title = 	 {VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors},
  author =       {Zhu, Yifeng and Joshi, Abhishek and Stone, Peter and Zhu, Yuke},
  booktitle = 	 {Proceedings of The 6th Conference on Robot Learning},
  pages = 	 {1199--1210},
  year = 	 {2023},
  editor = 	 {Liu, Karen and Kulic, Dana and Ichnowski, Jeff},
  volume = 	 {205},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--18 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v205/zhu23a/zhu23a.pdf},
  url = 	 {https://proceedings.mlr.press/v205/zhu23a.html},
  abstract = 	 {We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm’s robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website:  https://ut-austin-rpl.github.io/VIOLA/.}
}

Endnote

%0 Conference Paper
%T VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors
%A Yifeng Zhu
%A Abhishek Joshi
%A Peter Stone
%A Yuke Zhu
%B Proceedings of The 6th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Karen Liu
%E Dana Kulic
%E Jeff Ichnowski	
%F pmlr-v205-zhu23a
%I PMLR
%P 1199--1210
%U https://proceedings.mlr.press/v205/zhu23a.html
%V 205
%X We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm’s robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website:  https://ut-austin-rpl.github.io/VIOLA/.

APA


Zhu, Y., Joshi, A., Stone, P. & Zhu, Y.. (2023). VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1199-1210 Available from https://proceedings.mlr.press/v205/zhu23a.html.

VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Abstract

Cite this Paper

Related Material