Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning

Chenjun Li, Cheng Wan, Laurin Lux, Alexander H. Berger, Richard B. Rosen, Martin J. Menten, Johannes C. Paetzold
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:701-725, 2026.

Abstract

Vision-language models (VLMs) offer a promising path toward interpretable medical diagnosis by allowing users to ask about clinical explanations alongside predictions and across different modalities. However, training VLMs for detailed reasoning requires large-scale image-text datasets. In many specialized domains, for example in reading optical coherence tomography angiography (OCTA) images, such precise text with grounded description of pathologies is scarce or even non-existent. To overcome this bottleneck, we introduce synthetic vasculature reasoning (SVR), a framework that controllably synthesizes images and corresponding text, specifically: realistic retinal vasculature with diabetic retinopathy (DR) features: capillary dropout, microaneurysms, intraretinal microvascular abnormalities, and tortuosity, while automatically generating granular reasoning texts. Based on this we curate OCTA-100K-SVR, an OCTA image-reasoning dataset with 100,000 pairs. Our experiments show that a general-purpose VLM (Qwen3-VL-8b) trained on the dataset achieves a zero-shot balanced classification accuracy of 86.69% on real OCTA images, demonstrating performance comparable to supervised baselines. Through human expert evaluation we also demonstrate that it significantly enhances explanation quality and pathology localization on clinical data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-li26d, title = {Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning}, author = {Li, Chenjun and Wan, Cheng and Lux, Laurin and Berger, Alexander H. and Rosen, Richard B. and Menten, Martin J. and Paetzold, Johannes C.}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {701--725}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/li26d/li26d.pdf}, url = {https://proceedings.mlr.press/v315/li26d.html}, abstract = {Vision-language models (VLMs) offer a promising path toward interpretable medical diagnosis by allowing users to ask about clinical explanations alongside predictions and across different modalities. However, training VLMs for detailed reasoning requires large-scale image-text datasets. In many specialized domains, for example in reading optical coherence tomography angiography (OCTA) images, such precise text with grounded description of pathologies is scarce or even non-existent. To overcome this bottleneck, we introduce synthetic vasculature reasoning (SVR), a framework that controllably synthesizes images and corresponding text, specifically: realistic retinal vasculature with diabetic retinopathy (DR) features: capillary dropout, microaneurysms, intraretinal microvascular abnormalities, and tortuosity, while automatically generating granular reasoning texts. Based on this we curate OCTA-100K-SVR, an OCTA image-reasoning dataset with 100,000 pairs. Our experiments show that a general-purpose VLM (Qwen3-VL-8b) trained on the dataset achieves a zero-shot balanced classification accuracy of 86.69% on real OCTA images, demonstrating performance comparable to supervised baselines. Through human expert evaluation we also demonstrate that it significantly enhances explanation quality and pathology localization on clinical data.} }
Endnote
%0 Conference Paper %T Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning %A Chenjun Li %A Cheng Wan %A Laurin Lux %A Alexander H. Berger %A Richard B. Rosen %A Martin J. Menten %A Johannes C. Paetzold %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-li26d %I PMLR %P 701--725 %U https://proceedings.mlr.press/v315/li26d.html %V 315 %X Vision-language models (VLMs) offer a promising path toward interpretable medical diagnosis by allowing users to ask about clinical explanations alongside predictions and across different modalities. However, training VLMs for detailed reasoning requires large-scale image-text datasets. In many specialized domains, for example in reading optical coherence tomography angiography (OCTA) images, such precise text with grounded description of pathologies is scarce or even non-existent. To overcome this bottleneck, we introduce synthetic vasculature reasoning (SVR), a framework that controllably synthesizes images and corresponding text, specifically: realistic retinal vasculature with diabetic retinopathy (DR) features: capillary dropout, microaneurysms, intraretinal microvascular abnormalities, and tortuosity, while automatically generating granular reasoning texts. Based on this we curate OCTA-100K-SVR, an OCTA image-reasoning dataset with 100,000 pairs. Our experiments show that a general-purpose VLM (Qwen3-VL-8b) trained on the dataset achieves a zero-shot balanced classification accuracy of 86.69% on real OCTA images, demonstrating performance comparable to supervised baselines. Through human expert evaluation we also demonstrate that it significantly enhances explanation quality and pathology localization on clinical data.
APA
Li, C., Wan, C., Lux, L., Berger, A.H., Rosen, R.B., Menten, M.J. & Paetzold, J.C.. (2026). Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:701-725 Available from https://proceedings.mlr.press/v315/li26d.html.

Related Material