Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes

Feng Gao, Jincheng Yu, Hao Shen, Yu Wang, Huazhong Yang
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:2195-2205, 2021.

Abstract

Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene’s static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera’s ego-motion and the scene’s dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-gao21a, title = {Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes}, author = {Gao, Feng and Yu, Jincheng and Shen, Hao and Wang, Yu and Yang, Huazhong}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {2195--2205}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/gao21a/gao21a.pdf}, url = {https://proceedings.mlr.press/v155/gao21a.html}, abstract = {Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene’s static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera’s ego-motion and the scene’s dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.} }
Endnote
%0 Conference Paper %T Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes %A Feng Gao %A Jincheng Yu %A Hao Shen %A Yu Wang %A Huazhong Yang %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-gao21a %I PMLR %P 2195--2205 %U https://proceedings.mlr.press/v155/gao21a.html %V 155 %X Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene’s static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera’s ego-motion and the scene’s dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.
APA
Gao, F., Yu, J., Shen, H., Wang, Y. & Yang, H.. (2021). Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:2195-2205 Available from https://proceedings.mlr.press/v155/gao21a.html.

Related Material