End-to-End Learning of Semantic Grasping

Eric Jang, Sudheendra Vijayanarasimhan, Peter Pastor, Julian Ibarz, Sergey Levine
Proceedings of the 1st Annual Conference on Robot Learning, PMLR 78:119-132, 2017.

Abstract

We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v78-jang17a, title = {End-to-End Learning of Semantic Grasping}, author = {Jang, Eric and Vijayanarasimhan, Sudheendra and Pastor, Peter and Ibarz, Julian and Levine, Sergey}, booktitle = {Proceedings of the 1st Annual Conference on Robot Learning}, pages = {119--132}, year = {2017}, editor = {Levine, Sergey and Vanhoucke, Vincent and Goldberg, Ken}, volume = {78}, series = {Proceedings of Machine Learning Research}, month = {13--15 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v78/jang17a/jang17a.pdf}, url = {https://proceedings.mlr.press/v78/jang17a.html}, abstract = {We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.} }
Endnote
%0 Conference Paper %T End-to-End Learning of Semantic Grasping %A Eric Jang %A Sudheendra Vijayanarasimhan %A Peter Pastor %A Julian Ibarz %A Sergey Levine %B Proceedings of the 1st Annual Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2017 %E Sergey Levine %E Vincent Vanhoucke %E Ken Goldberg %F pmlr-v78-jang17a %I PMLR %P 119--132 %U https://proceedings.mlr.press/v78/jang17a.html %V 78 %X We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.
APA
Jang, E., Vijayanarasimhan, S., Pastor, P., Ibarz, J. & Levine, S.. (2017). End-to-End Learning of Semantic Grasping. Proceedings of the 1st Annual Conference on Robot Learning, in Proceedings of Machine Learning Research 78:119-132 Available from https://proceedings.mlr.press/v78/jang17a.html.

Related Material