Guideline-Informed MLLM Reasoning for Pathology-Aware Postoperative Prostate CTV Segmentation

Yinhao Wu; Hengrui Zhao; Haiqing Li; Wenliang Zhong; Hehuan Ma; Yuzhi Guo; Dan Nguyen; Daniel Yang; Steve Jiang; Junzhou Huang

Guideline-Informed MLLM Reasoning for Pathology-Aware Postoperative Prostate CTV Segmentation

Yinhao Wu, Hengrui Zhao, Haiqing Li, Wenliang Zhong, Hehuan Ma, Yuzhi Guo, Dan Nguyen, Daniel Yang, Steve Jiang, Junzhou Huang

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:1004-1028, 2026.

Abstract

Accurate segmentation of the Clinical Target Volume (CTV) is a critical prerequisite for precise radiotherapy planning, pursuing complete irradiation of microscopic disease while minimizing toxicity to surrounding healthy organs. However, achieving automated CTV segmentation remains highly challenging due to the invisible microscopic disease on planning CT and the necessity of incorporating clinical context into delineation decisions. Unlike previous methods that rely solely on visual features or coarse global text reasoning, we propose ReaCT, a unified framework that reformulates CTV segmentation as a multimodal reasoning task by explicitly integrating pathological information with visual context. Specifically, we introduce a Guideline-Informed Attribute Extractor that follows the information-retrieval workflow of radiation oncologists. By distilling knowledge from clinical guidelines, this module filters and structures lengthy pathology reports into a concise set of clinically determinative pathological attributes, effectively bridging the semantic gap between unstructured clinical records and segmentation networks. Furthermore, we develop an Attribute-Specific MLLM Reasoner built upon a 3D residual U-Net that performs fine-grained spatial reasoning. By leveraging a sequence of attribute-specific query tokens, the model disentangles the distinct target implications of individual pathological attributes, enabling fine-grained anatomical alignment via multi-scale fusion using Two-Way Transformers. Experiments on a postoperative prostate cancer dataset demonstrate that ReaCT achieves state-of-the-art segmentation performance and exhibits strong robustness, with pronounced improvements under limited-annotation settings.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-wu26b,
  title = 	 {Guideline-Informed MLLM Reasoning for Pathology-Aware Postoperative Prostate CTV Segmentation},
  author =       {Wu, Yinhao and Zhao, Hengrui and Li, Haiqing and Zhong, Wenliang and Ma, Hehuan and Guo, Yuzhi and Nguyen, Dan and Yang, Daniel and Jiang, Steve and Huang, Junzhou},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {1004--1028},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/wu26b/wu26b.pdf},
  url = 	 {https://proceedings.mlr.press/v315/wu26b.html},
  abstract = 	 {Accurate segmentation of the Clinical Target Volume (CTV) is a critical prerequisite for precise radiotherapy planning, pursuing complete irradiation of microscopic disease while minimizing toxicity to surrounding healthy organs. However, achieving automated CTV segmentation remains highly challenging due to the invisible microscopic disease on planning CT and the necessity of incorporating clinical context into delineation decisions. Unlike previous methods that rely solely on visual features or coarse global text reasoning, we propose ReaCT, a unified framework that reformulates CTV segmentation as a multimodal reasoning task by explicitly integrating pathological information with visual context. Specifically, we introduce a Guideline-Informed Attribute Extractor that follows the information-retrieval workflow of radiation oncologists. By distilling knowledge from clinical guidelines, this module filters and structures lengthy pathology reports into a concise set of clinically determinative pathological attributes, effectively bridging the semantic gap between unstructured clinical records and segmentation networks. Furthermore, we develop an Attribute-Specific MLLM Reasoner built upon a 3D residual U-Net that performs fine-grained spatial reasoning. By leveraging a sequence of attribute-specific query tokens, the model disentangles the distinct target implications of individual pathological attributes, enabling fine-grained anatomical alignment via multi-scale fusion using Two-Way Transformers. Experiments on a postoperative prostate cancer dataset demonstrate that ReaCT achieves state-of-the-art segmentation performance and exhibits strong robustness, with pronounced improvements under limited-annotation settings.}
}

Endnote

%0 Conference Paper
%T Guideline-Informed MLLM Reasoning for Pathology-Aware Postoperative Prostate CTV Segmentation
%A Yinhao Wu
%A Hengrui Zhao
%A Haiqing Li
%A Wenliang Zhong
%A Hehuan Ma
%A Yuzhi Guo
%A Dan Nguyen
%A Daniel Yang
%A Steve Jiang
%A Junzhou Huang
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-wu26b
%I PMLR
%P 1004--1028
%U https://proceedings.mlr.press/v315/wu26b.html
%V 315
%X Accurate segmentation of the Clinical Target Volume (CTV) is a critical prerequisite for precise radiotherapy planning, pursuing complete irradiation of microscopic disease while minimizing toxicity to surrounding healthy organs. However, achieving automated CTV segmentation remains highly challenging due to the invisible microscopic disease on planning CT and the necessity of incorporating clinical context into delineation decisions. Unlike previous methods that rely solely on visual features or coarse global text reasoning, we propose ReaCT, a unified framework that reformulates CTV segmentation as a multimodal reasoning task by explicitly integrating pathological information with visual context. Specifically, we introduce a Guideline-Informed Attribute Extractor that follows the information-retrieval workflow of radiation oncologists. By distilling knowledge from clinical guidelines, this module filters and structures lengthy pathology reports into a concise set of clinically determinative pathological attributes, effectively bridging the semantic gap between unstructured clinical records and segmentation networks. Furthermore, we develop an Attribute-Specific MLLM Reasoner built upon a 3D residual U-Net that performs fine-grained spatial reasoning. By leveraging a sequence of attribute-specific query tokens, the model disentangles the distinct target implications of individual pathological attributes, enabling fine-grained anatomical alignment via multi-scale fusion using Two-Way Transformers. Experiments on a postoperative prostate cancer dataset demonstrate that ReaCT achieves state-of-the-art segmentation performance and exhibits strong robustness, with pronounced improvements under limited-annotation settings.

APA

Wu, Y., Zhao, H., Li, H., Zhong, W., Ma, H., Guo, Y., Nguyen, D., Yang, D., Jiang, S. & Huang, J.. (2026). Guideline-Informed MLLM Reasoning for Pathology-Aware Postoperative Prostate CTV Segmentation. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:1004-1028 Available from https://proceedings.mlr.press/v315/wu26b.html.

Related Material

Download PDF