Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning

Shih-Wen Liu; Hsuan-Yu Fan; Wei-Ta Chu; Fu-En Yang; Yu-Chiang Frank Wang

Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning

Shih-Wen Liu, Hsuan-Yu Fan, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:1041-1052, 2026.

Abstract

Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.

Cite this Paper

BibTeX

@InProceedings{pmlr-v301-liu26a,
  title = 	 {Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning},
  author =       {Liu, Shih-Wen and Fan, Hsuan-Yu and Chu, Wei-Ta and Yang, Fu-En and Wang, Yu-Chiang Frank},
  booktitle = 	 {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {1041--1052},
  year = 	 {2026},
  editor = 	 {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan},
  volume = 	 {301},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v301/main/assets/liu26a/liu26a.pdf},
  url = 	 {https://proceedings.mlr.press/v301/liu26a.html},
  abstract = 	 {Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.}
}

Endnote

%0 Conference Paper
%T Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning
%A Shih-Wen Liu
%A Hsuan-Yu Fan
%A Wei-Ta Chu
%A Fu-En Yang
%A Yu-Chiang Frank Wang
%B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Tolga Tasdizen
%E Shireen Elhabian
%E Ronald Summers
%E Chen Chen
%E Lisa Koch
%E Yan Zhuang	
%F pmlr-v301-liu26a
%I PMLR
%P 1041--1052
%U https://proceedings.mlr.press/v301/liu26a.html
%V 301
%X Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.

APA

Liu, S., Fan, H., Chu, W., Yang, F. & Wang, Y.F.. (2026). Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:1041-1052 Available from https://proceedings.mlr.press/v301/liu26a.html.

Related Material

Download PDF