Effortless Vision-Language Model Specialization in Histopathology without Annotation

Jingna Qiu; Nishanth Jain; Jonas Ammeling; Marc Aubreville; Katharina Breininger

Effortless Vision-Language Model Specialization in Histopathology without Annotation

Jingna Qiu, Nishanth Jain, Jonas Ammeling, Marc Aubreville, Katharina Breininger

Proceedings of the MICCAI Workshop on Computational Pathology, PMLR 316:288-300, 2026.

Abstract

Recent advances in Vision-Language Models (VLMs) in histopathology, such as CONCH and QuiltNet, have demonstrated impressive zero-shot classification capabilities across various tasks. However, their general-purpose design may lead to suboptimal performance in specific downstream applications. While supervised fine-tuning methods address this issue, they require manually labeled samples for adaptation. This paper investigates annotationfree adaptation of VLMs through continued pretraining on domain- and task-relevant image-caption pairs extracted from existing databases. Our experiments on two VLMs, CONCH and QuiltNet, across three downstream tasks reveal that these pairs substantially enhance both zero-shot and few-shot performance. Notably, with larger training sizes, continued pretraining matches the performance of few-shot methods while eliminating manual labeling. Its effectiveness, task-agnostic design, and annotation-free workflow make it a promising pathway for adapting VLMs to new histopathology tasks. Code is available at https://github.com/DeepMicroscopy/Annotation-free-VLM-specialization.

Cite this Paper

BibTeX

@InProceedings{pmlr-v316-qiu26a,
  title = 	 {Effortless Vision-Language Model Specialization in Histopathology without Annotation},
  author =       {Qiu, Jingna and Jain, Nishanth and Ammeling, Jonas and Aubreville, Marc and Breininger, Katharina},
  booktitle = 	 {Proceedings of the MICCAI Workshop on Computational Pathology},
  pages = 	 {288--300},
  year = 	 {2026},
  editor = 	 {Studer, Linda and Ciompi, Francesco and Khalili, Nadieh and Faryna, Khrystyna and Faryna, Khrystyna and Yeong, Joe and Lau, Mai Chan and Chen, Hao and Liu, Ziyi and Brattoli, Biagio},
  volume = 	 {316},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v316/main/assets/qiu26a/qiu26a.pdf},
  url = 	 {https://proceedings.mlr.press/v316/qiu26a.html},
  abstract = 	 {Recent advances in Vision-Language Models (VLMs) in histopathology, such as CONCH and QuiltNet, have demonstrated impressive zero-shot classification capabilities across various tasks. However, their general-purpose design may lead to suboptimal performance in specific downstream applications. While supervised fine-tuning methods address this issue, they require manually labeled samples for adaptation. This paper investigates annotationfree adaptation of VLMs through continued pretraining on domain- and task-relevant image-caption pairs extracted from existing databases. Our experiments on two VLMs, CONCH and QuiltNet, across three downstream tasks reveal that these pairs substantially enhance both zero-shot and few-shot performance. Notably, with larger training sizes, continued pretraining matches the performance of few-shot methods while eliminating manual labeling. Its effectiveness, task-agnostic design, and annotation-free workflow make it a promising pathway for adapting VLMs to new histopathology tasks. Code is available at https://github.com/DeepMicroscopy/Annotation-free-VLM-specialization.}
}

Endnote

%0 Conference Paper
%T Effortless Vision-Language Model Specialization in Histopathology without Annotation
%A Jingna Qiu
%A Nishanth Jain
%A Jonas Ammeling
%A Marc Aubreville
%A Katharina Breininger
%B Proceedings of the MICCAI Workshop on Computational Pathology
%C Proceedings of Machine Learning Research
%D 2026
%E Linda Studer
%E Francesco Ciompi
%E Nadieh Khalili
%E Khrystyna Faryna
%E Khrystyna Faryna
%E Joe Yeong
%E Mai Chan Lau
%E Hao Chen
%E Ziyi Liu
%E Biagio Brattoli	
%F pmlr-v316-qiu26a
%I PMLR
%P 288--300
%U https://proceedings.mlr.press/v316/qiu26a.html
%V 316
%X Recent advances in Vision-Language Models (VLMs) in histopathology, such as CONCH and QuiltNet, have demonstrated impressive zero-shot classification capabilities across various tasks. However, their general-purpose design may lead to suboptimal performance in specific downstream applications. While supervised fine-tuning methods address this issue, they require manually labeled samples for adaptation. This paper investigates annotationfree adaptation of VLMs through continued pretraining on domain- and task-relevant image-caption pairs extracted from existing databases. Our experiments on two VLMs, CONCH and QuiltNet, across three downstream tasks reveal that these pairs substantially enhance both zero-shot and few-shot performance. Notably, with larger training sizes, continued pretraining matches the performance of few-shot methods while eliminating manual labeling. Its effectiveness, task-agnostic design, and annotation-free workflow make it a promising pathway for adapting VLMs to new histopathology tasks. Code is available at https://github.com/DeepMicroscopy/Annotation-free-VLM-specialization.

APA

Qiu, J., Jain, N., Ammeling, J., Aubreville, M. & Breininger, K.. (2026). Effortless Vision-Language Model Specialization in Histopathology without Annotation. Proceedings of the MICCAI Workshop on Computational Pathology, in Proceedings of Machine Learning Research 316:288-300 Available from https://proceedings.mlr.press/v316/qiu26a.html.

Related Material

Download PDF