End-to-End Learning of Semantic Grasping

Eric Jang; Sudheendra Vijayanarasimhan; Peter Pastor; Julian Ibarz; Sergey Levine

End-to-End Learning of Semantic Grasping

Eric Jang, Sudheendra Vijayanarasimhan, Peter Pastor, Julian Ibarz, Sergey Levine

Proceedings of the 1st Annual Conference on Robot Learning, PMLR 78:119-132, 2017.

Abstract

We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v78-jang17a,
  title = 	 {End-to-End Learning of Semantic Grasping},
  author = 	 {Jang, Eric and Vijayanarasimhan, Sudheendra and Pastor, Peter and Ibarz, Julian and Levine, Sergey},
  booktitle = 	 {Proceedings of the 1st Annual Conference on Robot Learning},
  pages = 	 {119--132},
  year = 	 {2017},
  editor = 	 {Levine, Sergey and Vanhoucke, Vincent and Goldberg, Ken},
  volume = 	 {78},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v78/jang17a/jang17a.pdf},
  url = 	 {https://proceedings.mlr.press/v78/jang17a.html},
  abstract = 	 {We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.}
}

Endnote

%0 Conference Paper
%T End-to-End Learning of Semantic Grasping
%A Eric Jang
%A Sudheendra Vijayanarasimhan
%A Peter Pastor
%A Julian Ibarz
%A Sergey Levine
%B Proceedings of the 1st Annual Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Sergey Levine
%E Vincent Vanhoucke
%E Ken Goldberg	
%F pmlr-v78-jang17a
%I PMLR
%P 119--132
%U https://proceedings.mlr.press/v78/jang17a.html
%V 78
%X We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.

APA


Jang, E., Vijayanarasimhan, S., Pastor, P., Ibarz, J. & Levine, S.. (2017). End-to-End Learning of Semantic Grasping. Proceedings of the 1st Annual Conference on Robot Learning, in Proceedings of Machine Learning Research 78:119-132 Available from https://proceedings.mlr.press/v78/jang17a.html.

Related Material

Download PDF