Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth

Yi-Rong Chen, Ching-Yu Tseng, Yi-Syuan Liou, Tsung-Han Wu, Winston H. Hsu
Proceedings of The 7th Conference on Robot Learning, PMLR 229:2995-3009, 2023.

Abstract

Monocular 3D object detection has seen significant advancements with the incorporation of depth information. However, there remains a considerable performance gap compared to LiDAR-based methods, largely due to inaccurate depth estimation. We argue that this issue stems from the commonly used pixel-wise depth map loss, which inherently creates the imbalance of loss weighting between near and distant objects. To address these challenges, we propose MonoHBD (Monocular Hierarchical Balanced Depth), a comprehensive solution with the hierarchical mechanism. We introduce the Hierarchical Depth Map (HDM) structure that incorporates depth bins and depth offsets to enhance the localization accuracy for objects. Leveraging RoIAlign, our Balanced Depth Extractor (BDE) module captures both scene-level depth relationships and object-specific depth characteristics while considering the geometry properties through the inclusion of camera calibration parameters. Furthermore, we propose a novel depth map loss that regularizes object-level depth features to mitigate imbalanced loss propagation. Our model reaches state-of-the-art results on the KITTI 3D object detection benchmark while supporting real-time detection. Excessive ablation studies are also conducted to prove the efficacy of our proposed modules.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-chen23d, title = {Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth}, author = {Chen, Yi-Rong and Tseng, Ching-Yu and Liou, Yi-Syuan and Wu, Tsung-Han and Hsu, Winston H.}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {2995--3009}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/chen23d/chen23d.pdf}, url = {https://proceedings.mlr.press/v229/chen23d.html}, abstract = {Monocular 3D object detection has seen significant advancements with the incorporation of depth information. However, there remains a considerable performance gap compared to LiDAR-based methods, largely due to inaccurate depth estimation. We argue that this issue stems from the commonly used pixel-wise depth map loss, which inherently creates the imbalance of loss weighting between near and distant objects. To address these challenges, we propose MonoHBD (Monocular Hierarchical Balanced Depth), a comprehensive solution with the hierarchical mechanism. We introduce the Hierarchical Depth Map (HDM) structure that incorporates depth bins and depth offsets to enhance the localization accuracy for objects. Leveraging RoIAlign, our Balanced Depth Extractor (BDE) module captures both scene-level depth relationships and object-specific depth characteristics while considering the geometry properties through the inclusion of camera calibration parameters. Furthermore, we propose a novel depth map loss that regularizes object-level depth features to mitigate imbalanced loss propagation. Our model reaches state-of-the-art results on the KITTI 3D object detection benchmark while supporting real-time detection. Excessive ablation studies are also conducted to prove the efficacy of our proposed modules.} }
Endnote
%0 Conference Paper %T Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth %A Yi-Rong Chen %A Ching-Yu Tseng %A Yi-Syuan Liou %A Tsung-Han Wu %A Winston H. Hsu %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-chen23d %I PMLR %P 2995--3009 %U https://proceedings.mlr.press/v229/chen23d.html %V 229 %X Monocular 3D object detection has seen significant advancements with the incorporation of depth information. However, there remains a considerable performance gap compared to LiDAR-based methods, largely due to inaccurate depth estimation. We argue that this issue stems from the commonly used pixel-wise depth map loss, which inherently creates the imbalance of loss weighting between near and distant objects. To address these challenges, we propose MonoHBD (Monocular Hierarchical Balanced Depth), a comprehensive solution with the hierarchical mechanism. We introduce the Hierarchical Depth Map (HDM) structure that incorporates depth bins and depth offsets to enhance the localization accuracy for objects. Leveraging RoIAlign, our Balanced Depth Extractor (BDE) module captures both scene-level depth relationships and object-specific depth characteristics while considering the geometry properties through the inclusion of camera calibration parameters. Furthermore, we propose a novel depth map loss that regularizes object-level depth features to mitigate imbalanced loss propagation. Our model reaches state-of-the-art results on the KITTI 3D object detection benchmark while supporting real-time detection. Excessive ablation studies are also conducted to prove the efficacy of our proposed modules.
APA
Chen, Y., Tseng, C., Liou, Y., Wu, T. & Hsu, W.H.. (2023). Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:2995-3009 Available from https://proceedings.mlr.press/v229/chen23d.html.

Related Material