Self-Supervised Object-in-Gripper Segmentation from Robotic Motions

Wout Boerdijk, Martin Sundermeyer, Maximilian Durner, Rudolph Triebel
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:1231-1245, 2021.

Abstract

Accurate object segmentation is a crucial task in the context of robotic manipulation. However, creating sufficient annotated training data for neural networks is particularly time consuming and often requires manual labeling. To this end, we propose a simple, yet robust solution for learning to segment unknown objects grasped by a robot. Specifically, we exploit motion and temporal cues in RGB video sequences. Using optical flow estimation we first learn to predict segmentation masks of our given manipulator. Then, these annotations are used in combination with motion cues to automatically distinguish between background, manipulator and unknown, grasped object. In contrast to existing systems our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data. We perform a thorough comparison with alternative baselines and approaches from literature. The object masks and views are shown to be suitable training data for segmentation networks that generalize to novel environments and also allow for watertight 3D reconstruction.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-boerdijk21a, title = {Self-Supervised Object-in-Gripper Segmentation from Robotic Motions}, author = {Boerdijk, Wout and Sundermeyer, Martin and Durner, Maximilian and Triebel, Rudolph}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {1231--1245}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/boerdijk21a/boerdijk21a.pdf}, url = {https://proceedings.mlr.press/v155/boerdijk21a.html}, abstract = {Accurate object segmentation is a crucial task in the context of robotic manipulation. However, creating sufficient annotated training data for neural networks is particularly time consuming and often requires manual labeling. To this end, we propose a simple, yet robust solution for learning to segment unknown objects grasped by a robot. Specifically, we exploit motion and temporal cues in RGB video sequences. Using optical flow estimation we first learn to predict segmentation masks of our given manipulator. Then, these annotations are used in combination with motion cues to automatically distinguish between background, manipulator and unknown, grasped object. In contrast to existing systems our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data. We perform a thorough comparison with alternative baselines and approaches from literature. The object masks and views are shown to be suitable training data for segmentation networks that generalize to novel environments and also allow for watertight 3D reconstruction.} }
Endnote
%0 Conference Paper %T Self-Supervised Object-in-Gripper Segmentation from Robotic Motions %A Wout Boerdijk %A Martin Sundermeyer %A Maximilian Durner %A Rudolph Triebel %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-boerdijk21a %I PMLR %P 1231--1245 %U https://proceedings.mlr.press/v155/boerdijk21a.html %V 155 %X Accurate object segmentation is a crucial task in the context of robotic manipulation. However, creating sufficient annotated training data for neural networks is particularly time consuming and often requires manual labeling. To this end, we propose a simple, yet robust solution for learning to segment unknown objects grasped by a robot. Specifically, we exploit motion and temporal cues in RGB video sequences. Using optical flow estimation we first learn to predict segmentation masks of our given manipulator. Then, these annotations are used in combination with motion cues to automatically distinguish between background, manipulator and unknown, grasped object. In contrast to existing systems our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data. We perform a thorough comparison with alternative baselines and approaches from literature. The object masks and views are shown to be suitable training data for segmentation networks that generalize to novel environments and also allow for watertight 3D reconstruction.
APA
Boerdijk, W., Sundermeyer, M., Durner, M. & Triebel, R.. (2021). Self-Supervised Object-in-Gripper Segmentation from Robotic Motions. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:1231-1245 Available from https://proceedings.mlr.press/v155/boerdijk21a.html.

Related Material