Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning

Xue Zhao, Qinying Gu, Xinbing Wang, Chenghu Zhou, Nanyang Ye
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:77521-77538, 2025.

Abstract

Improving the generalization of multi-camera 3D object detection is essential for safe autonomous driving in the real world. In this paper, we consider a realistic yet more challenging scenario, which aims to improve the generalization when only single source data available for training, as gathering diverse domains of data and collecting annotations is time-consuming and labor-intensive. To this end, we propose the Fourier Cross-View Learning (FCVL) framework including Fourier Hierarchical Augmentation (FHiAug), an augmentation strategy in the frequency domain to boost domain diversity, and Fourier Cross-View Semantic Consistency Loss to facilitate the model to learn more domain-invariant features from adjacent perspectives. Furthermore, we provide theoretical guarantees via augmentation graph theory. To the best of our knowledge, this is the first study to explore generalizable multi-camera 3D object detection with a single source. Extensive experiments on various testing domains have demonstrated that our approach achieves the best performance across various domain generalization methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhao25l, title = {Generalizable Multi-Camera 3{D} Object Detection from a Single Source via {F}ourier Cross-View Learning}, author = {Zhao, Xue and Gu, Qinying and Wang, Xinbing and Zhou, Chenghu and Ye, Nanyang}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {77521--77538}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhao25l/zhao25l.pdf}, url = {https://proceedings.mlr.press/v267/zhao25l.html}, abstract = {Improving the generalization of multi-camera 3D object detection is essential for safe autonomous driving in the real world. In this paper, we consider a realistic yet more challenging scenario, which aims to improve the generalization when only single source data available for training, as gathering diverse domains of data and collecting annotations is time-consuming and labor-intensive. To this end, we propose the Fourier Cross-View Learning (FCVL) framework including Fourier Hierarchical Augmentation (FHiAug), an augmentation strategy in the frequency domain to boost domain diversity, and Fourier Cross-View Semantic Consistency Loss to facilitate the model to learn more domain-invariant features from adjacent perspectives. Furthermore, we provide theoretical guarantees via augmentation graph theory. To the best of our knowledge, this is the first study to explore generalizable multi-camera 3D object detection with a single source. Extensive experiments on various testing domains have demonstrated that our approach achieves the best performance across various domain generalization methods.} }
Endnote
%0 Conference Paper %T Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning %A Xue Zhao %A Qinying Gu %A Xinbing Wang %A Chenghu Zhou %A Nanyang Ye %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhao25l %I PMLR %P 77521--77538 %U https://proceedings.mlr.press/v267/zhao25l.html %V 267 %X Improving the generalization of multi-camera 3D object detection is essential for safe autonomous driving in the real world. In this paper, we consider a realistic yet more challenging scenario, which aims to improve the generalization when only single source data available for training, as gathering diverse domains of data and collecting annotations is time-consuming and labor-intensive. To this end, we propose the Fourier Cross-View Learning (FCVL) framework including Fourier Hierarchical Augmentation (FHiAug), an augmentation strategy in the frequency domain to boost domain diversity, and Fourier Cross-View Semantic Consistency Loss to facilitate the model to learn more domain-invariant features from adjacent perspectives. Furthermore, we provide theoretical guarantees via augmentation graph theory. To the best of our knowledge, this is the first study to explore generalizable multi-camera 3D object detection with a single source. Extensive experiments on various testing domains have demonstrated that our approach achieves the best performance across various domain generalization methods.
APA
Zhao, X., Gu, Q., Wang, X., Zhou, C. & Ye, N.. (2025). Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:77521-77538 Available from https://proceedings.mlr.press/v267/zhao25l.html.

Related Material