Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes

Feng Gao; Jincheng Yu; Hao Shen; Yu Wang; Huazhong Yang

Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes

Feng Gao, Jincheng Yu, Hao Shen, Yu Wang, Huazhong Yang

Proceedings of the 2020 Conference on Robot Learning, PMLR 155:2195-2205, 2021.

Abstract

Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene’s static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera’s ego-motion and the scene’s dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.

Cite this Paper

BibTeX


@InProceedings{pmlr-v155-gao21a,
  title = 	 {Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes},
  author =       {Gao, Feng and Yu, Jincheng and Shen, Hao and Wang, Yu and Yang, Huazhong},
  booktitle = 	 {Proceedings of the 2020 Conference on Robot Learning},
  pages = 	 {2195--2205},
  year = 	 {2021},
  editor = 	 {Kober, Jens and Ramos, Fabio and Tomlin, Claire},
  volume = 	 {155},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v155/gao21a/gao21a.pdf},
  url = 	 {https://proceedings.mlr.press/v155/gao21a.html},
  abstract = 	 {Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene’s static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera’s ego-motion and the scene’s dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.}
}

Endnote

%0 Conference Paper
%T Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes
%A Feng Gao
%A Jincheng Yu
%A Hao Shen
%A Yu Wang
%A Huazhong Yang
%B Proceedings of the 2020 Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Jens Kober
%E Fabio Ramos
%E Claire Tomlin	
%F pmlr-v155-gao21a
%I PMLR
%P 2195--2205
%U https://proceedings.mlr.press/v155/gao21a.html
%V 155
%X Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene’s static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera’s ego-motion and the scene’s dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.

APA


Gao, F., Yu, J., Shen, H., Wang, Y. & Yang, H.. (2021). Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:2195-2205 Available from https://proceedings.mlr.press/v155/gao21a.html.

Related Material

Download PDF