Learning 3D Dynamic Scene Representations for Robot Manipulation

Zhenjia Xu; Zhanpeng He; Jiajun Wu; Shuran Song

Learning 3D Dynamic Scene Representations for Robot Manipulation

Zhenjia Xu, Zhanpeng He, Jiajun Wu, Shuran Song

Proceedings of the 2020 Conference on Robot Learning, PMLR 155:126-142, 2021.

Abstract

3D scene representation for robot manipulation should capture three key object properties: permanency - objects that become occluded over time continue to exist; amodal completeness - objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity - the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing. Code and data are available at dsr-net.cs.columbia.edu.

Cite this Paper

BibTeX


@InProceedings{pmlr-v155-xu21b,
  title = 	 {Learning 3D Dynamic Scene Representations for Robot Manipulation},
  author =       {Xu, Zhenjia and He, Zhanpeng and Wu, Jiajun and Song, Shuran},
  booktitle = 	 {Proceedings of the 2020 Conference on Robot Learning},
  pages = 	 {126--142},
  year = 	 {2021},
  editor = 	 {Kober, Jens and Ramos, Fabio and Tomlin, Claire},
  volume = 	 {155},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v155/xu21b/xu21b.pdf},
  url = 	 {https://proceedings.mlr.press/v155/xu21b.html},
  abstract = 	 {3D scene representation for robot manipulation should capture three key object properties: permanency - objects that become occluded over time continue to exist; amodal completeness - objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity - the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing. Code and data are available at dsr-net.cs.columbia.edu.}
}

Endnote

%0 Conference Paper
%T Learning 3D Dynamic Scene Representations for Robot Manipulation
%A Zhenjia Xu
%A Zhanpeng He
%A Jiajun Wu
%A Shuran Song
%B Proceedings of the 2020 Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Jens Kober
%E Fabio Ramos
%E Claire Tomlin	
%F pmlr-v155-xu21b
%I PMLR
%P 126--142
%U https://proceedings.mlr.press/v155/xu21b.html
%V 155
%X 3D scene representation for robot manipulation should capture three key object properties: permanency - objects that become occluded over time continue to exist; amodal completeness - objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity - the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing. Code and data are available at dsr-net.cs.columbia.edu.

APA


Xu, Z., He, Z., Wu, J. & Song, S.. (2021). Learning 3D Dynamic Scene Representations for Robot Manipulation. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:126-142 Available from https://proceedings.mlr.press/v155/xu21b.html.

Related Material

Download PDF