Unsupervised Monocular Depth Learning in Dynamic Scenes

Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, Anelia Angelova
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:1908-1917, 2021.

Abstract

We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision. We show that this apparently heavily underdetermined problem can be regularized by imposing the following prior knowledge about 3D translation fields: they are sparse, since most of the scene is static, and they tend to be piecewise constant for rigid moving objects. We show that this regularization alone is sufficient to train monocular depth prediction models that exceed the accuracy achieved in prior work for dynamic scenes, including methods that require semantic input.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-li21a, title = {Unsupervised Monocular Depth Learning in Dynamic Scenes}, author = {Li, Hanhan and Gordon, Ariel and Zhao, Hang and Casser, Vincent and Angelova, Anelia}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {1908--1917}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/li21a/li21a.pdf}, url = {https://proceedings.mlr.press/v155/li21a.html}, abstract = {We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision. We show that this apparently heavily underdetermined problem can be regularized by imposing the following prior knowledge about 3D translation fields: they are sparse, since most of the scene is static, and they tend to be piecewise constant for rigid moving objects. We show that this regularization alone is sufficient to train monocular depth prediction models that exceed the accuracy achieved in prior work for dynamic scenes, including methods that require semantic input.} }
Endnote
%0 Conference Paper %T Unsupervised Monocular Depth Learning in Dynamic Scenes %A Hanhan Li %A Ariel Gordon %A Hang Zhao %A Vincent Casser %A Anelia Angelova %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-li21a %I PMLR %P 1908--1917 %U https://proceedings.mlr.press/v155/li21a.html %V 155 %X We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision. We show that this apparently heavily underdetermined problem can be regularized by imposing the following prior knowledge about 3D translation fields: they are sparse, since most of the scene is static, and they tend to be piecewise constant for rigid moving objects. We show that this regularization alone is sufficient to train monocular depth prediction models that exceed the accuracy achieved in prior work for dynamic scenes, including methods that require semantic input.
APA
Li, H., Gordon, A., Zhao, H., Casser, V. & Angelova, A.. (2021). Unsupervised Monocular Depth Learning in Dynamic Scenes. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:1908-1917 Available from https://proceedings.mlr.press/v155/li21a.html.

Related Material