One VLM, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows

Shayan Vassef; Soorya Ram Shimgekar; Abhay Goyal; Christian Poellabauer; Koustuv Saha; Pi Zonooz; Navin Kumar

One VLM, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows

Shayan Vassef, Soorya Ram Shimgekar, Abhay Goyal, Christian Poellabauer, Koustuv Saha, Pi Zonooz, Navin Kumar

Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:263-274, 2026.

Abstract

Clinical {ML} workflows are often fragmented and inefficient: triage, task selection, and model deployment are handled by a patchwork of task-specific networks. These pipelines are rarely aligned with data-science practice, reducing efficiency and increasing operational cost. They also lack data-driven model identification (from imaging/tabular inputs) and standardized delivery of model outputs. We present a framework that employs a single vision–language model ({VLM}) in two complementary, modular roles. First (Solution 1): the {VLM} acts as an aware model-card matcher that routes an incoming image to the appropriate specialist model via a three-stage workflow (modality $\rightarrow$ primary abnormality $\rightarrow$ model-card {ID}). Reliability is improved by (i) stage-wise prompts enabling early termination via None/Other and (ii) a calibrated top-2 answer selector with a stage-wise cutoff. This raises routing accuracy by +9 and +11 percentage points on the training and held-out splits, respectively, compared with a baseline router, and improves held-out calibration (lower {ECE}). Second (Solution 2): we fine-tune the same {VLM} on specialty-specific datasets so that one model per specialty covers multiple downstream tasks, simplifying deployment while maintaining performance. Across gastroenterology, hematology, ophthalmology, pathology, and radiology, this single-model deployment matches or approaches specialized baselines. Together, these solutions reduce data-science effort through more accurate selection, simplify monitoring and maintenance by consolidating task-specific models, and increase transparency via per-stage justifications and calibrated thresholds. Each solution stands alone, and in combination they offer a practical, modular path from triage to deployment.

Cite this Paper

BibTeX

@InProceedings{pmlr-v297-vassef26a,
  title = 	 {One {VLM}, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows},
  author =       {Vassef, Shayan and Shimgekar, Soorya Ram and Goyal, Abhay and Poellabauer, Christian and Saha, Koustuv and Zonooz, Pi and Kumar, Navin},
  booktitle = 	 {Proceedings of the Fifth Machine Learning for Health Symposium},
  pages = 	 {263--274},
  year = 	 {2026},
  editor = 	 {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush},
  volume = 	 {297},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v297/main/assets/vassef26a/vassef26a.pdf},
  url = 	 {https://proceedings.mlr.press/v297/vassef26a.html},
  abstract = 	 {Clinical {ML} workflows are often fragmented and inefficient: triage, task selection, and model deployment are handled by a patchwork of task-specific networks. These pipelines are rarely aligned with data-science practice, reducing efficiency and increasing operational cost. They also lack data-driven model identification (from imaging/tabular inputs) and standardized delivery of model outputs. We present a framework that employs a single vision–language model ({VLM}) in two complementary, modular roles. First (Solution 1): the {VLM} acts as an aware model-card matcher that routes an incoming image to the appropriate specialist model via a three-stage workflow (modality $\rightarrow$ primary abnormality $\rightarrow$ model-card {ID}). Reliability is improved by (i) stage-wise prompts enabling early termination via None/Other and (ii) a calibrated top-2 answer selector with a stage-wise cutoff. This raises routing accuracy by +9 and +11 percentage points on the training and held-out splits, respectively, compared with a baseline router, and improves held-out calibration (lower {ECE}). Second (Solution 2): we fine-tune the same {VLM} on specialty-specific datasets so that one model per specialty covers multiple downstream tasks, simplifying deployment while maintaining performance. Across gastroenterology, hematology, ophthalmology, pathology, and radiology, this single-model deployment matches or approaches specialized baselines. Together, these solutions reduce data-science effort through more accurate selection, simplify monitoring and maintenance by consolidating task-specific models, and increase transparency via per-stage justifications and calibrated thresholds. Each solution stands alone, and in combination they offer a practical, modular path from triage to deployment.}
}

Endnote

%0 Conference Paper
%T One VLM, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows
%A Shayan Vassef
%A Soorya Ram Shimgekar
%A Abhay Goyal
%A Christian Poellabauer
%A Koustuv Saha
%A Pi Zonooz
%A Navin Kumar
%B Proceedings of the Fifth Machine Learning for Health Symposium
%C Proceedings of Machine Learning Research
%D 2026
%E Peniel Argaw
%E Haoran Zhang
%E Sarah Jabbour
%E Payal Chandak
%E Jerry Ji
%E Sumit Mukherjee
%E Olawale Salaudeen
%E Trenton Chang
%E Elizabeth Healey
%E Fabian Gröger
%E Amin Adibi
%E Stefan Hegselmann
%E Benjamin Wild
%E Ayush Noori	
%F pmlr-v297-vassef26a
%I PMLR
%P 263--274
%U https://proceedings.mlr.press/v297/vassef26a.html
%V 297
%X Clinical {ML} workflows are often fragmented and inefficient: triage, task selection, and model deployment are handled by a patchwork of task-specific networks. These pipelines are rarely aligned with data-science practice, reducing efficiency and increasing operational cost. They also lack data-driven model identification (from imaging/tabular inputs) and standardized delivery of model outputs. We present a framework that employs a single vision–language model ({VLM}) in two complementary, modular roles. First (Solution 1): the {VLM} acts as an aware model-card matcher that routes an incoming image to the appropriate specialist model via a three-stage workflow (modality $\rightarrow$ primary abnormality $\rightarrow$ model-card {ID}). Reliability is improved by (i) stage-wise prompts enabling early termination via None/Other and (ii) a calibrated top-2 answer selector with a stage-wise cutoff. This raises routing accuracy by +9 and +11 percentage points on the training and held-out splits, respectively, compared with a baseline router, and improves held-out calibration (lower {ECE}). Second (Solution 2): we fine-tune the same {VLM} on specialty-specific datasets so that one model per specialty covers multiple downstream tasks, simplifying deployment while maintaining performance. Across gastroenterology, hematology, ophthalmology, pathology, and radiology, this single-model deployment matches or approaches specialized baselines. Together, these solutions reduce data-science effort through more accurate selection, simplify monitoring and maintenance by consolidating task-specific models, and increase transparency via per-stage justifications and calibrated thresholds. Each solution stands alone, and in combination they offer a practical, modular path from triage to deployment.

APA

Vassef, S., Shimgekar, S.R., Goyal, A., Poellabauer, C., Saha, K., Zonooz, P. & Kumar, N.. (2026). One VLM, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:263-274 Available from https://proceedings.mlr.press/v297/vassef26a.html.

Related Material

Download PDF