Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks

Benedikt Mersch, Xieyuanli Chen, Jens Behley, Cyrill Stachniss
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1444-1454, 2022.

Abstract

Exploiting past 3D LiDAR scans to predict future point clouds is a promising method for autonomous mobile systems to realize foresighted state estimation, collision avoidance, and planning. In this paper, we address the problem of predicting future 3D LiDAR point clouds given a sequence of past LiDAR scans. Estimating the future scene on the sensor level does not require any preceding steps as in localization or tracking systems and can be trained self-supervised. We propose an end-to-end approach that exploits a 2D range image representation of each 3D LiDAR scan and concatenates a sequence of range images to obtain a 3D tensor. Based on such tensors, we develop an encoder-decoder architecture using 3D convolutions to jointly aggregate spatial and temporal information of the scene and to predict the future 3D point clouds. We evaluate our method on multiple datasets and the experimental results suggest that our method outperforms existing point cloud prediction architectures and generalizes well to new, unseen environments without additional fine-tuning. Our method operates online and is faster than the common LiDAR frame rate of 10 Hz.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-mersch22a, title = {Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks}, author = {Mersch, Benedikt and Chen, Xieyuanli and Behley, Jens and Stachniss, Cyrill}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1444--1454}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/mersch22a/mersch22a.pdf}, url = {https://proceedings.mlr.press/v164/mersch22a.html}, abstract = {Exploiting past 3D LiDAR scans to predict future point clouds is a promising method for autonomous mobile systems to realize foresighted state estimation, collision avoidance, and planning. In this paper, we address the problem of predicting future 3D LiDAR point clouds given a sequence of past LiDAR scans. Estimating the future scene on the sensor level does not require any preceding steps as in localization or tracking systems and can be trained self-supervised. We propose an end-to-end approach that exploits a 2D range image representation of each 3D LiDAR scan and concatenates a sequence of range images to obtain a 3D tensor. Based on such tensors, we develop an encoder-decoder architecture using 3D convolutions to jointly aggregate spatial and temporal information of the scene and to predict the future 3D point clouds. We evaluate our method on multiple datasets and the experimental results suggest that our method outperforms existing point cloud prediction architectures and generalizes well to new, unseen environments without additional fine-tuning. Our method operates online and is faster than the common LiDAR frame rate of 10 Hz.} }
Endnote
%0 Conference Paper %T Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks %A Benedikt Mersch %A Xieyuanli Chen %A Jens Behley %A Cyrill Stachniss %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-mersch22a %I PMLR %P 1444--1454 %U https://proceedings.mlr.press/v164/mersch22a.html %V 164 %X Exploiting past 3D LiDAR scans to predict future point clouds is a promising method for autonomous mobile systems to realize foresighted state estimation, collision avoidance, and planning. In this paper, we address the problem of predicting future 3D LiDAR point clouds given a sequence of past LiDAR scans. Estimating the future scene on the sensor level does not require any preceding steps as in localization or tracking systems and can be trained self-supervised. We propose an end-to-end approach that exploits a 2D range image representation of each 3D LiDAR scan and concatenates a sequence of range images to obtain a 3D tensor. Based on such tensors, we develop an encoder-decoder architecture using 3D convolutions to jointly aggregate spatial and temporal information of the scene and to predict the future 3D point clouds. We evaluate our method on multiple datasets and the experimental results suggest that our method outperforms existing point cloud prediction architectures and generalizes well to new, unseen environments without additional fine-tuning. Our method operates online and is faster than the common LiDAR frame rate of 10 Hz.
APA
Mersch, B., Chen, X., Behley, J. & Stachniss, C.. (2022). Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1444-1454 Available from https://proceedings.mlr.press/v164/mersch22a.html.

Related Material