Enhancing Visual Localization with Cross-Domain Image Generation

Yuanze Wang; Yichao Yan; Shiming Song; Songchang Jin; Yilan Huang; Xingdong Sheng; Dianxi Shi

Enhancing Visual Localization with Cross-Domain Image Generation

Yuanze Wang, Yichao Yan, Shiming Song, Songchang Jin, Yilan Huang, Xingdong Sheng, Dianxi Shi

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:65075-65090, 2025.

Abstract

Visual localization aims to predict the absolute camera pose for a single query image. However, predominant methods focus on single-camera images and scenes with limited appearance variations, limiting their applicability to cross-domain scenes commonly encountered in real-world applications. Furthermore, the long-tail distribution of cross-domain datasets poses additional challenges for visual localization. In this work, we propose a novel cross-domain data generation method to enhance visual localization methods. To achieve this, we first construct a cross-domain 3DGS to accurately model photometric variations and mitigate the interference of dynamic objects in large-scale scenes. We introduce a text-guided image editing model to enhance data diversity for addressing the long-tail distribution problem and design an effective fine-tuning strategy for it. Then, we develop an anchor-based method to generate high-quality datasets for visual localization. Finally, we introduce positional attention to address data ambiguities in cross-camera images. Extensive experiments show that our method achieves state-of-the-art accuracy, outperforming existing cross-domain visual localization methods by an average of 59% across all domains. Project page: https://yzwang-sjtu.github.io/CDG-Loc.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-wang25dv,
  title = 	 {Enhancing Visual Localization with Cross-Domain Image Generation},
  author =       {Wang, Yuanze and Yan, Yichao and Song, Shiming and Jin, Songchang and Huang, Yilan and Sheng, Xingdong and Shi, Dianxi},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {65075--65090},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25dv/wang25dv.pdf},
  url = 	 {https://proceedings.mlr.press/v267/wang25dv.html},
  abstract = 	 {Visual localization aims to predict the absolute camera pose for a single query image. However, predominant methods focus on single-camera images and scenes with limited appearance variations, limiting their applicability to cross-domain scenes commonly encountered in real-world applications. Furthermore, the long-tail distribution of cross-domain datasets poses additional challenges for visual localization. In this work, we propose a novel cross-domain data generation method to enhance visual localization methods. To achieve this, we first construct a cross-domain 3DGS to accurately model photometric variations and mitigate the interference of dynamic objects in large-scale scenes. We introduce a text-guided image editing model to enhance data diversity for addressing the long-tail distribution problem and design an effective fine-tuning strategy for it. Then, we develop an anchor-based method to generate high-quality datasets for visual localization. Finally, we introduce positional attention to address data ambiguities in cross-camera images. Extensive experiments show that our method achieves state-of-the-art accuracy, outperforming existing cross-domain visual localization methods by an average of 59% across all domains. Project page: https://yzwang-sjtu.github.io/CDG-Loc.}
}

Endnote

%0 Conference Paper
%T Enhancing Visual Localization with Cross-Domain Image Generation
%A Yuanze Wang
%A Yichao Yan
%A Shiming Song
%A Songchang Jin
%A Yilan Huang
%A Xingdong Sheng
%A Dianxi Shi
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-wang25dv
%I PMLR
%P 65075--65090
%U https://proceedings.mlr.press/v267/wang25dv.html
%V 267
%X Visual localization aims to predict the absolute camera pose for a single query image. However, predominant methods focus on single-camera images and scenes with limited appearance variations, limiting their applicability to cross-domain scenes commonly encountered in real-world applications. Furthermore, the long-tail distribution of cross-domain datasets poses additional challenges for visual localization. In this work, we propose a novel cross-domain data generation method to enhance visual localization methods. To achieve this, we first construct a cross-domain 3DGS to accurately model photometric variations and mitigate the interference of dynamic objects in large-scale scenes. We introduce a text-guided image editing model to enhance data diversity for addressing the long-tail distribution problem and design an effective fine-tuning strategy for it. Then, we develop an anchor-based method to generate high-quality datasets for visual localization. Finally, we introduce positional attention to address data ambiguities in cross-camera images. Extensive experiments show that our method achieves state-of-the-art accuracy, outperforming existing cross-domain visual localization methods by an average of 59% across all domains. Project page: https://yzwang-sjtu.github.io/CDG-Loc.

APA

Wang, Y., Yan, Y., Song, S., Jin, S., Huang, Y., Sheng, X. & Shi, D.. (2025). Enhancing Visual Localization with Cross-Domain Image Generation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:65075-65090 Available from https://proceedings.mlr.press/v267/wang25dv.html.

Enhancing Visual Localization with Cross-Domain Image Generation

Abstract

Cite this Paper

Related Material