Learning Interpretable BEV Based VIO without Deep Neural Networks

Zexi Chen; Haozhe Du; Xuecheng XU; Rong Xiong; Yiyi Liao; Yue Wang

Learning Interpretable BEV Based VIO without Deep Neural Networks

Zexi Chen, Haozhe Du, Xuecheng XU, Rong Xiong, Yiyi Liao, Yue Wang

Proceedings of The 6th Conference on Robot Learning, PMLR 205:1289-1298, 2023.

Abstract

Monocular visual-inertial odometry (VIO) is a critical problem in robotics and autonomous driving. Traditional methods solve this problem based on filtering or optimization. While being fully interpretable, they rely on manual interference and empirical parameter tuning. On the other hand, learning-based approaches allow for end-to-end training but require a large number of training data to learn millions of parameters. However, the non-interpretable and heavy models hinder the generalization ability. In this paper, we propose a fully differentiable, and interpretable, bird-eye-view (BEV) based VIO model for robots with local planar motion that can be trained without deep neural networks. Specifically, we first adopt Unscented Kalman Filter as a differentiable layer to predict the pitch and roll, where the covariance matrices of noise are learned to filter out the noise of the IMU raw data. Second, the refined pitch and roll are adopted to retrieve a gravity-aligned BEV image of each frame using differentiable camera projection. Finally, a differentiable pose estimator is utilized to estimate the remaining 3 DoF poses between the BEV frames: leading to a 5 DoF pose estimation. Our method allows for learning the covariance matrices end-to-end supervised by the pose estimation loss, demonstrating superior performance to empirical baselines. Experimental results on synthetic and real-world datasets demonstrate that our simple approach is competitive with state-of-the-art methods and generalizes well on unseen scenes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v205-chen23c,
  title = 	 {Learning Interpretable BEV Based VIO without Deep Neural Networks},
  author =       {Chen, Zexi and Du, Haozhe and XU, Xuecheng and Xiong, Rong and Liao, Yiyi and Wang, Yue},
  booktitle = 	 {Proceedings of The 6th Conference on Robot Learning},
  pages = 	 {1289--1298},
  year = 	 {2023},
  editor = 	 {Liu, Karen and Kulic, Dana and Ichnowski, Jeff},
  volume = 	 {205},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--18 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v205/chen23c/chen23c.pdf},
  url = 	 {https://proceedings.mlr.press/v205/chen23c.html},
  abstract = 	 {Monocular visual-inertial odometry (VIO) is a critical problem in robotics and autonomous driving. Traditional methods solve this problem based on filtering or optimization. While being fully interpretable, they rely on manual interference and empirical parameter tuning. On the other hand, learning-based approaches allow for end-to-end training but require a large number of training data to learn millions of parameters. However, the non-interpretable and heavy models hinder the generalization ability. In this paper, we propose a fully differentiable, and interpretable, bird-eye-view (BEV) based VIO model for robots with local planar motion that can be trained without deep neural networks. Specifically, we first adopt Unscented Kalman Filter as a differentiable layer to predict the pitch and roll, where the covariance matrices of noise are learned to filter out the noise of the IMU raw data.  Second, the refined pitch and roll are adopted to retrieve a gravity-aligned BEV image of each frame using differentiable camera projection. Finally, a differentiable pose estimator is utilized to estimate the remaining 3 DoF poses between the BEV frames: leading to a 5 DoF pose estimation. Our method allows for learning the covariance matrices end-to-end supervised by the pose estimation loss, demonstrating superior performance to empirical baselines. Experimental results on synthetic and real-world datasets demonstrate that our simple approach is competitive with state-of-the-art methods and generalizes well on unseen scenes.}
}

Endnote

%0 Conference Paper
%T Learning Interpretable BEV Based VIO without Deep Neural Networks
%A Zexi Chen
%A Haozhe Du
%A Xuecheng XU
%A Rong Xiong
%A Yiyi Liao
%A Yue Wang
%B Proceedings of The 6th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Karen Liu
%E Dana Kulic
%E Jeff Ichnowski	
%F pmlr-v205-chen23c
%I PMLR
%P 1289--1298
%U https://proceedings.mlr.press/v205/chen23c.html
%V 205
%X Monocular visual-inertial odometry (VIO) is a critical problem in robotics and autonomous driving. Traditional methods solve this problem based on filtering or optimization. While being fully interpretable, they rely on manual interference and empirical parameter tuning. On the other hand, learning-based approaches allow for end-to-end training but require a large number of training data to learn millions of parameters. However, the non-interpretable and heavy models hinder the generalization ability. In this paper, we propose a fully differentiable, and interpretable, bird-eye-view (BEV) based VIO model for robots with local planar motion that can be trained without deep neural networks. Specifically, we first adopt Unscented Kalman Filter as a differentiable layer to predict the pitch and roll, where the covariance matrices of noise are learned to filter out the noise of the IMU raw data.  Second, the refined pitch and roll are adopted to retrieve a gravity-aligned BEV image of each frame using differentiable camera projection. Finally, a differentiable pose estimator is utilized to estimate the remaining 3 DoF poses between the BEV frames: leading to a 5 DoF pose estimation. Our method allows for learning the covariance matrices end-to-end supervised by the pose estimation loss, demonstrating superior performance to empirical baselines. Experimental results on synthetic and real-world datasets demonstrate that our simple approach is competitive with state-of-the-art methods and generalizes well on unseen scenes.

APA


Chen, Z., Du, H., XU, X., Xiong, R., Liao, Y. & Wang, Y.. (2023). Learning Interpretable BEV Based VIO without Deep Neural Networks. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1289-1298 Available from https://proceedings.mlr.press/v205/chen23c.html.

Learning Interpretable BEV Based VIO without Deep Neural Networks

Abstract

Cite this Paper

Related Material