[edit]
CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:275-294, 2026.
Abstract
The National Comprehensive Cancer Network ({NCCN}) provides evidence-based guidelines for cancer treatment. Translating complex patient presentations into guideline-compliant treatment recommendations is time-intensive, requires specialized expertise, and is prone to error. Advances in large language model ({LLM}) capabilities promise to reduce the time required to generate treatment recommendations and improve accuracy. We present an {LLM} agent-based approach to automatically generate guideline-concordant treatment trajectories for patients with non-small cell lung cancer ({NSCLC}). Our contributions are threefold. First, we construct a novel longitudinal dataset of 121 cases of {NSCLC} patients that includes clinical encounters, diagnostic results, and medical histories, each expertly annotated with the corresponding {NCCN} guideline trajectories by board-certified oncologists. Second, we demonstrate that existing {LLM}s possess domain-specific knowledge that enables high-quality proxy benchmark generation for both model development and evaluation, achieving strong correlation (Spearman coefficient r = 0.88, {RMSE} = 0.08) with expert-annotated benchmarks. Third, we develop a hybrid approach combining expensive human annotations with model consistency information to create both the agent framework that predicts the relevant guidelines for a patient, as well as a meta-classifier that verifies prediction accuracy with calibrated confidence scores for treatment recommendations ({AUROC} = 0.800). Calibrated confidence scoring is a critical capability for communicating the accuracy of outputs, custom-tailoring tradeoffs in performance, and supporting regulatory compliance. This work establishes a framework for clinically viable {LLM}-based guideline adherence systems that balance accuracy, interpretability, and regulatory requirements while reducing annotation costs, providing a scalable pathway toward automated clinical decision support.