TartanVO: A Generalizable Learning-based VO

Wenshan Wang, Yaoyu Hu, Sebastian Scherer
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:1761-1772, 2021.

Abstract

We present the first learning-based visual odometry (VO) model, which generalizes to multiple datasets and real-world scenarios and outperforms geometry-based methods in challenging scenes. We achieve this by leveraging the SLAM dataset TartanAir, which provides a large amount of diverse synthetic data in challenging environments. Furthermore, to make our VO model generalize across datasets, we propose an up-to-scale loss function and incorporate the camera intrinsic parameters into the model. Experiments show that a single model, TartanVO, trained only on synthetic data, without any finetuning, can be generalized to real-world datasets such as KITTI and EuRoC, demonstrating significant advantages over the geometry-based methods on challenging trajectories. Our code is available at https://github.com/castacks/tartanvo.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-wang21h, title = {TartanVO: A Generalizable Learning-based VO}, author = {Wang, Wenshan and Hu, Yaoyu and Scherer, Sebastian}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {1761--1772}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/wang21h/wang21h.pdf}, url = {https://proceedings.mlr.press/v155/wang21h.html}, abstract = {We present the first learning-based visual odometry (VO) model, which generalizes to multiple datasets and real-world scenarios and outperforms geometry-based methods in challenging scenes. We achieve this by leveraging the SLAM dataset TartanAir, which provides a large amount of diverse synthetic data in challenging environments. Furthermore, to make our VO model generalize across datasets, we propose an up-to-scale loss function and incorporate the camera intrinsic parameters into the model. Experiments show that a single model, TartanVO, trained only on synthetic data, without any finetuning, can be generalized to real-world datasets such as KITTI and EuRoC, demonstrating significant advantages over the geometry-based methods on challenging trajectories. Our code is available at https://github.com/castacks/tartanvo.} }
Endnote
%0 Conference Paper %T TartanVO: A Generalizable Learning-based VO %A Wenshan Wang %A Yaoyu Hu %A Sebastian Scherer %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-wang21h %I PMLR %P 1761--1772 %U https://proceedings.mlr.press/v155/wang21h.html %V 155 %X We present the first learning-based visual odometry (VO) model, which generalizes to multiple datasets and real-world scenarios and outperforms geometry-based methods in challenging scenes. We achieve this by leveraging the SLAM dataset TartanAir, which provides a large amount of diverse synthetic data in challenging environments. Furthermore, to make our VO model generalize across datasets, we propose an up-to-scale loss function and incorporate the camera intrinsic parameters into the model. Experiments show that a single model, TartanVO, trained only on synthetic data, without any finetuning, can be generalized to real-world datasets such as KITTI and EuRoC, demonstrating significant advantages over the geometry-based methods on challenging trajectories. Our code is available at https://github.com/castacks/tartanvo.
APA
Wang, W., Hu, Y. & Scherer, S.. (2021). TartanVO: A Generalizable Learning-based VO. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:1761-1772 Available from https://proceedings.mlr.press/v155/wang21h.html.

Related Material