Unpaired Learning of Dense Visual Depth Estimators for Urban Environments

Vitor Guizilini, Fabio Ramos
Proceedings of The 2nd Conference on Robot Learning, PMLR 87:200-212, 2018.

Abstract

This paper addresses the classical problem of learning-based monocular depth estimation in urban environments, in which a model is trained to directly map a single input image to its corresponding depth values. All currently available techniques treat monocular depth estimation as a regression problem, and thus require some sort of data pairing, either explicitly as input-output ground-truth pairs, using information from range sensors (i.e. laser), or as binocular stereo footage. We introduce a novel methodology that completely eliminates the need for data pairing, only requiring two unrelated datasets containing samples of input images and output depth values. A cycle-consistent generative adversarial network is used to learn a mapping between these two domains, based on a custom adversarial loss function specifically designed to improve performance on the task of monocular depth estimation, including local depth smoothness and boundary equilibrium. A wide range of experiments were conducted using a variety of well-known indoor and outdoor datasets, with depth estimates obtained from laser sensors, RGBD cameras and SLAM pointclouds. In all of them, the proposed CycleDepth framework reaches competitive results even under a more restricted training scenario.

Cite this Paper


BibTeX
@InProceedings{pmlr-v87-guizilini18b, title = {Unpaired Learning of Dense Visual Depth Estimators for Urban Environments}, author = {Guizilini, Vitor and Ramos, Fabio}, booktitle = {Proceedings of The 2nd Conference on Robot Learning}, pages = {200--212}, year = {2018}, editor = {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun}, volume = {87}, series = {Proceedings of Machine Learning Research}, month = {29--31 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v87/guizilini18b/guizilini18b.pdf}, url = {https://proceedings.mlr.press/v87/guizilini18b.html}, abstract = {This paper addresses the classical problem of learning-based monocular depth estimation in urban environments, in which a model is trained to directly map a single input image to its corresponding depth values. All currently available techniques treat monocular depth estimation as a regression problem, and thus require some sort of data pairing, either explicitly as input-output ground-truth pairs, using information from range sensors (i.e. laser), or as binocular stereo footage. We introduce a novel methodology that completely eliminates the need for data pairing, only requiring two unrelated datasets containing samples of input images and output depth values. A cycle-consistent generative adversarial network is used to learn a mapping between these two domains, based on a custom adversarial loss function specifically designed to improve performance on the task of monocular depth estimation, including local depth smoothness and boundary equilibrium. A wide range of experiments were conducted using a variety of well-known indoor and outdoor datasets, with depth estimates obtained from laser sensors, RGBD cameras and SLAM pointclouds. In all of them, the proposed CycleDepth framework reaches competitive results even under a more restricted training scenario. } }
Endnote
%0 Conference Paper %T Unpaired Learning of Dense Visual Depth Estimators for Urban Environments %A Vitor Guizilini %A Fabio Ramos %B Proceedings of The 2nd Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2018 %E Aude Billard %E Anca Dragan %E Jan Peters %E Jun Morimoto %F pmlr-v87-guizilini18b %I PMLR %P 200--212 %U https://proceedings.mlr.press/v87/guizilini18b.html %V 87 %X This paper addresses the classical problem of learning-based monocular depth estimation in urban environments, in which a model is trained to directly map a single input image to its corresponding depth values. All currently available techniques treat monocular depth estimation as a regression problem, and thus require some sort of data pairing, either explicitly as input-output ground-truth pairs, using information from range sensors (i.e. laser), or as binocular stereo footage. We introduce a novel methodology that completely eliminates the need for data pairing, only requiring two unrelated datasets containing samples of input images and output depth values. A cycle-consistent generative adversarial network is used to learn a mapping between these two domains, based on a custom adversarial loss function specifically designed to improve performance on the task of monocular depth estimation, including local depth smoothness and boundary equilibrium. A wide range of experiments were conducted using a variety of well-known indoor and outdoor datasets, with depth estimates obtained from laser sensors, RGBD cameras and SLAM pointclouds. In all of them, the proposed CycleDepth framework reaches competitive results even under a more restricted training scenario.
APA
Guizilini, V. & Ramos, F.. (2018). Unpaired Learning of Dense Visual Depth Estimators for Urban Environments. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:200-212 Available from https://proceedings.mlr.press/v87/guizilini18b.html.

Related Material