Unpaired Learning of Dense Visual Depth Estimators for Urban Environments

Vitor Guizilini; Fabio Ramos

Unpaired Learning of Dense Visual Depth Estimators for Urban Environments

Vitor Guizilini, Fabio Ramos

Proceedings of The 2nd Conference on Robot Learning, PMLR 87:200-212, 2018.

Abstract

This paper addresses the classical problem of learning-based monocular depth estimation in urban environments, in which a model is trained to directly map a single input image to its corresponding depth values. All currently available techniques treat monocular depth estimation as a regression problem, and thus require some sort of data pairing, either explicitly as input-output ground-truth pairs, using information from range sensors (i.e. laser), or as binocular stereo footage. We introduce a novel methodology that completely eliminates the need for data pairing, only requiring two unrelated datasets containing samples of input images and output depth values. A cycle-consistent generative adversarial network is used to learn a mapping between these two domains, based on a custom adversarial loss function specifically designed to improve performance on the task of monocular depth estimation, including local depth smoothness and boundary equilibrium. A wide range of experiments were conducted using a variety of well-known indoor and outdoor datasets, with depth estimates obtained from laser sensors, RGBD cameras and SLAM pointclouds. In all of them, the proposed CycleDepth framework reaches competitive results even under a more restricted training scenario.

Cite this Paper

BibTeX


@InProceedings{pmlr-v87-guizilini18b,
  title = 	 {Unpaired Learning of Dense Visual Depth Estimators for Urban Environments},
  author =       {Guizilini, Vitor and Ramos, Fabio},
  booktitle = 	 {Proceedings of The 2nd Conference on Robot Learning},
  pages = 	 {200--212},
  year = 	 {2018},
  editor = 	 {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun},
  volume = 	 {87},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--31 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v87/guizilini18b/guizilini18b.pdf},
  url = 	 {https://proceedings.mlr.press/v87/guizilini18b.html},
  abstract = 	 {This paper addresses the classical problem of learning-based monocular depth estimation in urban environments, in which a model is trained to directly map a single input image to its corresponding depth values. All currently available techniques treat monocular depth estimation as a regression problem, and thus require some sort of data pairing, either explicitly as input-output ground-truth pairs, using information from range sensors (i.e. laser), or as binocular stereo footage. We introduce a novel methodology that completely eliminates the need for data pairing, only requiring two unrelated datasets containing samples of input images and output depth values. A cycle-consistent generative adversarial network is used to learn a mapping between these two domains, based on a custom adversarial loss function specifically designed to improve performance on the task of monocular depth estimation, including local depth smoothness and boundary equilibrium. A wide range of experiments were conducted using a variety of well-known indoor and outdoor datasets, with depth estimates obtained from laser sensors, RGBD cameras and SLAM pointclouds. In all of them, the proposed CycleDepth framework reaches competitive results even under a more restricted training scenario. }
}

Endnote

%0 Conference Paper
%T Unpaired Learning of Dense Visual Depth Estimators for Urban Environments
%A Vitor Guizilini
%A Fabio Ramos
%B Proceedings of The 2nd Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Aude Billard
%E Anca Dragan
%E Jan Peters
%E Jun Morimoto	
%F pmlr-v87-guizilini18b
%I PMLR
%P 200--212
%U https://proceedings.mlr.press/v87/guizilini18b.html
%V 87
%X This paper addresses the classical problem of learning-based monocular depth estimation in urban environments, in which a model is trained to directly map a single input image to its corresponding depth values. All currently available techniques treat monocular depth estimation as a regression problem, and thus require some sort of data pairing, either explicitly as input-output ground-truth pairs, using information from range sensors (i.e. laser), or as binocular stereo footage. We introduce a novel methodology that completely eliminates the need for data pairing, only requiring two unrelated datasets containing samples of input images and output depth values. A cycle-consistent generative adversarial network is used to learn a mapping between these two domains, based on a custom adversarial loss function specifically designed to improve performance on the task of monocular depth estimation, including local depth smoothness and boundary equilibrium. A wide range of experiments were conducted using a variety of well-known indoor and outdoor datasets, with depth estimates obtained from laser sensors, RGBD cameras and SLAM pointclouds. In all of them, the proposed CycleDepth framework reaches competitive results even under a more restricted training scenario.

APA


Guizilini, V. & Ramos, F.. (2018). Unpaired Learning of Dense Visual Depth Estimators for Urban Environments. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:200-212 Available from https://proceedings.mlr.press/v87/guizilini18b.html.

Related Material

Download PDF