Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances

Vitor Guizilini, Jie Li, Rares Ambrus, Sudeep Pillai, Adrien Gaidon
; Proceedings of the Conference on Robot Learning, PMLR 100:503-512, 2020.

Abstract

Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks. Initially viewed as a direct regression problem, requiring annotated labels as supervision at training time, in the past few years a substantial amount of work has been done in self-supervised depth training based on strong geometric cues, both from stereo cameras and more recently from monocular video sequences. In this paper we investigate how these two approaches (supervised & self-supervised) can be effectively combined, so that a depth model can learn to encode true scale from sparse supervision while achieving high fidelity local accuracy by leveraging geometric cues. To this end, we propose a novel supervised loss term that complements the widely used photometric loss, and show how it can be used to train robust semi-supervised monocular depth estimation models. Furthermore, we evaluate how much supervision is actually necessary to train accurate scale-aware monocular depth models, showing that with our proposed framework, very sparse LiDAR information, with as few as 4 beams (less than 100 valid depth values per image), is enough to achieve results competitive with the current state-of-the-art.

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-guizilini20a, title = {Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances}, author = {Guizilini, Vitor and Li, Jie and Ambrus, Rares and Pillai, Sudeep and Gaidon, Adrien}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {503--512}, year = {2020}, editor = {Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura}, volume = {100}, series = {Proceedings of Machine Learning Research}, address = {}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/guizilini20a/guizilini20a.pdf}, url = {http://proceedings.mlr.press/v100/guizilini20a.html}, abstract = {Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks. Initially viewed as a direct regression problem, requiring annotated labels as supervision at training time, in the past few years a substantial amount of work has been done in self-supervised depth training based on strong geometric cues, both from stereo cameras and more recently from monocular video sequences. In this paper we investigate how these two approaches (supervised & self-supervised) can be effectively combined, so that a depth model can learn to encode true scale from sparse supervision while achieving high fidelity local accuracy by leveraging geometric cues. To this end, we propose a novel supervised loss term that complements the widely used photometric loss, and show how it can be used to train robust semi-supervised monocular depth estimation models. Furthermore, we evaluate how much supervision is actually necessary to train accurate scale-aware monocular depth models, showing that with our proposed framework, very sparse LiDAR information, with as few as 4 beams (less than 100 valid depth values per image), is enough to achieve results competitive with the current state-of-the-art.} }
Endnote
%0 Conference Paper %T Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances %A Vitor Guizilini %A Jie Li %A Rares Ambrus %A Sudeep Pillai %A Adrien Gaidon %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-guizilini20a %I PMLR %J Proceedings of Machine Learning Research %P 503--512 %U http://proceedings.mlr.press %V 100 %W PMLR %X Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks. Initially viewed as a direct regression problem, requiring annotated labels as supervision at training time, in the past few years a substantial amount of work has been done in self-supervised depth training based on strong geometric cues, both from stereo cameras and more recently from monocular video sequences. In this paper we investigate how these two approaches (supervised & self-supervised) can be effectively combined, so that a depth model can learn to encode true scale from sparse supervision while achieving high fidelity local accuracy by leveraging geometric cues. To this end, we propose a novel supervised loss term that complements the widely used photometric loss, and show how it can be used to train robust semi-supervised monocular depth estimation models. Furthermore, we evaluate how much supervision is actually necessary to train accurate scale-aware monocular depth models, showing that with our proposed framework, very sparse LiDAR information, with as few as 4 beams (less than 100 valid depth values per image), is enough to achieve results competitive with the current state-of-the-art.
APA
Guizilini, V., Li, J., Ambrus, R., Pillai, S. & Gaidon, A.. (2020). Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances. Proceedings of the Conference on Robot Learning, in PMLR 100:503-512

Related Material