TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

Lukas Koestler, Nan Yang, Niclas Zeller, Daniel Cremers
Proceedings of the 5th Conference on Robot Learning, PMLR 164:34-45, 2022.

Abstract

In this paper, we present TANDEM a real-time monocular tracking and dense mapping framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of keyframes. To increase the robustness, we propose a novel tracking front-end that performs dense direct image alignment using depth maps rendered from a global model that is built incrementally from dense depth predictions. To predict the dense depth maps, we propose Cascade View-Aggregation MVSNet (CVA-MVSNet) that utilizes the entire active keyframe window by hierarchically constructing 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Finally, the predicted depth maps are fused into a consistent global map represented as a truncated signed distance function (TSDF) voxel grid. Our experimental results show that TANDEM outperforms other state-of-the-art traditional and learning-based monocular visual odometry (VO) methods in terms of camera tracking. Moreover, TANDEM shows state-of-the-art real-time 3D reconstruction performance. Webpage: https://go.vision.in.tum.de/tandem

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-koestler22a, title = {TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo}, author = {Koestler, Lukas and Yang, Nan and Zeller, Niclas and Cremers, Daniel}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {34--45}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/koestler22a/koestler22a.pdf}, url = {https://proceedings.mlr.press/v164/koestler22a.html}, abstract = {In this paper, we present TANDEM a real-time monocular tracking and dense mapping framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of keyframes. To increase the robustness, we propose a novel tracking front-end that performs dense direct image alignment using depth maps rendered from a global model that is built incrementally from dense depth predictions. To predict the dense depth maps, we propose Cascade View-Aggregation MVSNet (CVA-MVSNet) that utilizes the entire active keyframe window by hierarchically constructing 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Finally, the predicted depth maps are fused into a consistent global map represented as a truncated signed distance function (TSDF) voxel grid. Our experimental results show that TANDEM outperforms other state-of-the-art traditional and learning-based monocular visual odometry (VO) methods in terms of camera tracking. Moreover, TANDEM shows state-of-the-art real-time 3D reconstruction performance. Webpage: https://go.vision.in.tum.de/tandem} }
Endnote
%0 Conference Paper %T TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo %A Lukas Koestler %A Nan Yang %A Niclas Zeller %A Daniel Cremers %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-koestler22a %I PMLR %P 34--45 %U https://proceedings.mlr.press/v164/koestler22a.html %V 164 %X In this paper, we present TANDEM a real-time monocular tracking and dense mapping framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of keyframes. To increase the robustness, we propose a novel tracking front-end that performs dense direct image alignment using depth maps rendered from a global model that is built incrementally from dense depth predictions. To predict the dense depth maps, we propose Cascade View-Aggregation MVSNet (CVA-MVSNet) that utilizes the entire active keyframe window by hierarchically constructing 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Finally, the predicted depth maps are fused into a consistent global map represented as a truncated signed distance function (TSDF) voxel grid. Our experimental results show that TANDEM outperforms other state-of-the-art traditional and learning-based monocular visual odometry (VO) methods in terms of camera tracking. Moreover, TANDEM shows state-of-the-art real-time 3D reconstruction performance. Webpage: https://go.vision.in.tum.de/tandem
APA
Koestler, L., Yang, N., Zeller, N. & Cremers, D.. (2022). TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:34-45 Available from https://proceedings.mlr.press/v164/koestler22a.html.

Related Material