[edit]
Toward Improving Diagnostic Reasoning for Spina Bifida Care: Benchmarking LLM–Patient Interactions
Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:552-571, 2026.
Abstract
Spina Bifida (SB) is a complex neural tube defect that presents multifaceted healthcare challenges requiring multidisciplinary management. While advances in foundation models (FMs) offer promising avenues for enhancing SB care through intelligent, context-aware support, existing models struggle to accurately identify and reason about SB’s diverse symptoms. This study benchmarks eight widely used large language models (LLMs) through qualitative and quantitative evaluations, focusing on their ability to address the unique medical challenges of SB. This study presents an \textit{inverse prompting} technique aimed at guiding LLMs through a step-by-step diagnostic process. By incorporating a predefined set of symptoms relevant to SB, this approach prevents premature conclusions and enhances diagnostic reasoning, starting to address the Problem of Inclusion-Exclusion (PIE) as formulated in this study. Our evaluations reveal significant limitations in the LLMs’ abilities to accurately diagnose SB-related conditions, underscoring the need for specialized approaches. Building on these findings, this study proposes a novel framework that integrates a structured, symptom-based knowledge base specific to SB, enhancing the models’ contextual understanding and reasoning capabilities. This work highlights the potential of tailored AI solutions in improving access to care for individuals with SB, particularly in populations where gaps in knowledgeable providers persist. By addressing the shortcomings of general-purpose LLMs, our suggested framework aims to streamline SB care and improve patient outcomes, paving the way for more effective AI-assisted healthcare interventions in complex chronic conditions.