Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning

Shih-Wen Liu, Hsuan-Yu Fan, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang
Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:1041-1052, 2026.

Abstract

Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v301-liu26a, title = {Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning}, author = {Liu, Shih-Wen and Fan, Hsuan-Yu and Chu, Wei-Ta and Yang, Fu-En and Wang, Yu-Chiang Frank}, booktitle = {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning}, pages = {1041--1052}, year = {2026}, editor = {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan}, volume = {301}, series = {Proceedings of Machine Learning Research}, month = {09--11 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v301/main/assets/liu26a/liu26a.pdf}, url = {https://proceedings.mlr.press/v301/liu26a.html}, abstract = {Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.} }
Endnote
%0 Conference Paper %T Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning %A Shih-Wen Liu %A Hsuan-Yu Fan %A Wei-Ta Chu %A Fu-En Yang %A Yu-Chiang Frank Wang %B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Tolga Tasdizen %E Shireen Elhabian %E Ronald Summers %E Chen Chen %E Lisa Koch %E Yan Zhuang %F pmlr-v301-liu26a %I PMLR %P 1041--1052 %U https://proceedings.mlr.press/v301/liu26a.html %V 301 %X Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.
APA
Liu, S., Fan, H., Chu, W., Yang, F. & Wang, Y.F.. (2026). Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:1041-1052 Available from https://proceedings.mlr.press/v301/liu26a.html.

Related Material