Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable

Liangtao Zheng, Yicheng Liu, Yue Wang, Hang Zhao
Proceedings of The 7th Conference on Robot Learning, PMLR 229:1903-1929, 2023.

Abstract

While camera-based 3D object detection has evolved rapidly, these models are susceptible to overfitting to specific sensor setups. For example, in autonomous driving, most datasets are collected using a single sensor configuration. This paper evaluates the generalization capability of camera-based 3D object detectors, including adapting detectors from one dataset to another and training detectors with multiple datasets. We observe that merely aggregating datasets yields drastic performance drops, contrary to the expected improvements associated with increased training data. To close the gap, we introduce an efficient technique for aligning disparate sensor configurations — a combination of camera intrinsic synchronization, camera extrinsic correction, and ego frame alignment, which collectively enhance cross-dataset performance remarkably. Compared with single dataset baselines, we achieve 42.3 mAP improvement on KITTI, 23.2 mAP improvement on Lyft, 18.5 mAP improvement on nuScenes, 17.3 mAP improvement on KITTI-360, 8.4 mAP improvement on Argoverse2 and 3.9 mAP improvement on Waymo. We hope this comprehensive study can facilitate research on generalizable 3D object detection and associated tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-zheng23a, title = {Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable}, author = {Zheng, Liangtao and Liu, Yicheng and Wang, Yue and Zhao, Hang}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {1903--1929}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/zheng23a/zheng23a.pdf}, url = {https://proceedings.mlr.press/v229/zheng23a.html}, abstract = {While camera-based 3D object detection has evolved rapidly, these models are susceptible to overfitting to specific sensor setups. For example, in autonomous driving, most datasets are collected using a single sensor configuration. This paper evaluates the generalization capability of camera-based 3D object detectors, including adapting detectors from one dataset to another and training detectors with multiple datasets. We observe that merely aggregating datasets yields drastic performance drops, contrary to the expected improvements associated with increased training data. To close the gap, we introduce an efficient technique for aligning disparate sensor configurations — a combination of camera intrinsic synchronization, camera extrinsic correction, and ego frame alignment, which collectively enhance cross-dataset performance remarkably. Compared with single dataset baselines, we achieve 42.3 mAP improvement on KITTI, 23.2 mAP improvement on Lyft, 18.5 mAP improvement on nuScenes, 17.3 mAP improvement on KITTI-360, 8.4 mAP improvement on Argoverse2 and 3.9 mAP improvement on Waymo. We hope this comprehensive study can facilitate research on generalizable 3D object detection and associated tasks.} }
Endnote
%0 Conference Paper %T Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable %A Liangtao Zheng %A Yicheng Liu %A Yue Wang %A Hang Zhao %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-zheng23a %I PMLR %P 1903--1929 %U https://proceedings.mlr.press/v229/zheng23a.html %V 229 %X While camera-based 3D object detection has evolved rapidly, these models are susceptible to overfitting to specific sensor setups. For example, in autonomous driving, most datasets are collected using a single sensor configuration. This paper evaluates the generalization capability of camera-based 3D object detectors, including adapting detectors from one dataset to another and training detectors with multiple datasets. We observe that merely aggregating datasets yields drastic performance drops, contrary to the expected improvements associated with increased training data. To close the gap, we introduce an efficient technique for aligning disparate sensor configurations — a combination of camera intrinsic synchronization, camera extrinsic correction, and ego frame alignment, which collectively enhance cross-dataset performance remarkably. Compared with single dataset baselines, we achieve 42.3 mAP improvement on KITTI, 23.2 mAP improvement on Lyft, 18.5 mAP improvement on nuScenes, 17.3 mAP improvement on KITTI-360, 8.4 mAP improvement on Argoverse2 and 3.9 mAP improvement on Waymo. We hope this comprehensive study can facilitate research on generalizable 3D object detection and associated tasks.
APA
Zheng, L., Liu, Y., Wang, Y. & Zhao, H.. (2023). Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:1903-1929 Available from https://proceedings.mlr.press/v229/zheng23a.html.

Related Material