Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance

Fan Li; Xuan Wang; Min Qi; Zhaoxiang Zhang; Yuelei Xu

Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance

Fan Li, Xuan Wang, Min Qi, Zhaoxiang Zhang, Yuelei Xu

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:36129-36139, 2025.

Abstract

Domain Generalized Semantic Segmentation (DGSS) trains a model on a labeled source domain to generalize to unseen target domains with consistent contextual distribution and varying visual appearance. Most existing methods rely on domain randomization or data generation but struggle to capture the underlying scene distribution, resulting in the loss of useful semantic information. Inspired by the diffusion model’s capability to generate diverse variations within a given scene context, we consider harnessing its rich prior knowledge of scene distribution to tackle the challenging DGSS task. In this paper, we propose a novel agent Query-driven learning framework based on Diffusion model guidance for DGSS, named QueryDiff. Our recipe comprises three key ingredients: (1) generating agent queries from segmentation features to aggregate semantic information about instances within the scene; (2) learning the inherent semantic distribution of the scene through agent queries guided by diffusion features; (3) refining segmentation features using optimized agent queries for robust mask predictions. Extensive experiments across various settings demonstrate that our method significantly outperforms previous state-of-the-art methods. Notably, it enhances the model’s ability to generalize effectively to extreme domains, such as cubist art styles. Code is available at https://github.com/FanLiHub/QueryDiff.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-li25co,
  title = 	 {Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance},
  author =       {Li, Fan and Wang, Xuan and Qi, Min and Zhang, Zhaoxiang and Xu, Yuelei},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {36129--36139},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/li25co/li25co.pdf},
  url = 	 {https://proceedings.mlr.press/v267/li25co.html},
  abstract = 	 {Domain Generalized Semantic Segmentation (DGSS) trains a model on a labeled source domain to generalize to unseen target domains with consistent contextual distribution and varying visual appearance. Most existing methods rely on domain randomization or data generation but struggle to capture the underlying scene distribution, resulting in the loss of useful semantic information. Inspired by the diffusion model’s capability to generate diverse variations within a given scene context, we consider harnessing its rich prior knowledge of scene distribution to tackle the challenging DGSS task. In this paper, we propose a novel agent Query-driven learning framework based on Diffusion model guidance for DGSS, named QueryDiff. Our recipe comprises three key ingredients: (1) generating agent queries from segmentation features to aggregate semantic information about instances within the scene; (2) learning the inherent semantic distribution of the scene through agent queries guided by diffusion features; (3) refining segmentation features using optimized agent queries for robust mask predictions. Extensive experiments across various settings demonstrate that our method significantly outperforms previous state-of-the-art methods. Notably, it enhances the model’s ability to generalize effectively to extreme domains, such as cubist art styles. Code is available at https://github.com/FanLiHub/QueryDiff.}
}

Endnote

%0 Conference Paper
%T Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance
%A Fan Li
%A Xuan Wang
%A Min Qi
%A Zhaoxiang Zhang
%A Yuelei Xu
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-li25co
%I PMLR
%P 36129--36139
%U https://proceedings.mlr.press/v267/li25co.html
%V 267
%X Domain Generalized Semantic Segmentation (DGSS) trains a model on a labeled source domain to generalize to unseen target domains with consistent contextual distribution and varying visual appearance. Most existing methods rely on domain randomization or data generation but struggle to capture the underlying scene distribution, resulting in the loss of useful semantic information. Inspired by the diffusion model’s capability to generate diverse variations within a given scene context, we consider harnessing its rich prior knowledge of scene distribution to tackle the challenging DGSS task. In this paper, we propose a novel agent Query-driven learning framework based on Diffusion model guidance for DGSS, named QueryDiff. Our recipe comprises three key ingredients: (1) generating agent queries from segmentation features to aggregate semantic information about instances within the scene; (2) learning the inherent semantic distribution of the scene through agent queries guided by diffusion features; (3) refining segmentation features using optimized agent queries for robust mask predictions. Extensive experiments across various settings demonstrate that our method significantly outperforms previous state-of-the-art methods. Notably, it enhances the model’s ability to generalize effectively to extreme domains, such as cubist art styles. Code is available at https://github.com/FanLiHub/QueryDiff.

APA

Li, F., Wang, X., Qi, M., Zhang, Z. & Xu, Y.. (2025). Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:36129-36139 Available from https://proceedings.mlr.press/v267/li25co.html.

Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance

Abstract

Cite this Paper

Related Material