[edit]
One VLM, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:263-274, 2026.
Abstract
Clinical {ML} workflows are often fragmented and inefficient: triage, task selection, and model deployment are handled by a patchwork of task-specific networks. These pipelines are rarely aligned with data-science practice, reducing efficiency and increasing operational cost. They also lack data-driven model identification (from imaging/tabular inputs) and standardized delivery of model outputs. We present a framework that employs a single vision–language model ({VLM}) in two complementary, modular roles. First (Solution 1): the {VLM} acts as an aware model-card matcher that routes an incoming image to the appropriate specialist model via a three-stage workflow (modality $\rightarrow$ primary abnormality $\rightarrow$ model-card {ID}). Reliability is improved by (i) stage-wise prompts enabling early termination via None/Other and (ii) a calibrated top-2 answer selector with a stage-wise cutoff. This raises routing accuracy by +9 and +11 percentage points on the training and held-out splits, respectively, compared with a baseline router, and improves held-out calibration (lower {ECE}). Second (Solution 2): we fine-tune the same {VLM} on specialty-specific datasets so that one model per specialty covers multiple downstream tasks, simplifying deployment while maintaining performance. Across gastroenterology, hematology, ophthalmology, pathology, and radiology, this single-model deployment matches or approaches specialized baselines. Together, these solutions reduce data-science effort through more accurate selection, simplify monitoring and maintenance by consolidating task-specific models, and increase transparency via per-stage justifications and calibrated thresholds. Each solution stands alone, and in combination they offer a practical, modular path from triage to deployment.