Toward Improving Diagnostic Reasoning for Spina Bifida Care: Benchmarking LLM–Patient Interactions

Asfandyar Azhar, Shaurjya Mandal, Zaid Khan, Nidhish Shah, Curtis Langlotz, Brad Dicianno
Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:552-571, 2026.

Abstract

Spina Bifida (SB) is a complex neural tube defect that presents multifaceted healthcare challenges requiring multidisciplinary management. While advances in foundation models (FMs) offer promising avenues for enhancing SB care through intelligent, context-aware support, existing models struggle to accurately identify and reason about SB’s diverse symptoms. This study benchmarks eight widely used large language models (LLMs) through qualitative and quantitative evaluations, focusing on their ability to address the unique medical challenges of SB. This study presents an \textit{inverse prompting} technique aimed at guiding LLMs through a step-by-step diagnostic process. By incorporating a predefined set of symptoms relevant to SB, this approach prevents premature conclusions and enhances diagnostic reasoning, starting to address the Problem of Inclusion-Exclusion (PIE) as formulated in this study. Our evaluations reveal significant limitations in the LLMs’ abilities to accurately diagnose SB-related conditions, underscoring the need for specialized approaches. Building on these findings, this study proposes a novel framework that integrates a structured, symptom-based knowledge base specific to SB, enhancing the models’ contextual understanding and reasoning capabilities. This work highlights the potential of tailored AI solutions in improving access to care for individuals with SB, particularly in populations where gaps in knowledgeable providers persist. By addressing the shortcomings of general-purpose LLMs, our suggested framework aims to streamline SB care and improve patient outcomes, paving the way for more effective AI-assisted healthcare interventions in complex chronic conditions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v333-azhar26a, title = {Toward Improving Diagnostic Reasoning for Spina Bifida Care: Benchmarking LLM–Patient Interactions}, author = {Azhar, Asfandyar and Mandal, Shaurjya and Khan, Zaid and Shah, Nidhish and Langlotz, Curtis and Dicianno, Brad}, booktitle = {Proceedings of the 7th Conference on Health, Inference, and Learning}, pages = {552--571}, year = {2026}, editor = {Healey, Elizabeth and Fries, Jason and Pollard, Tom and Tang, Shengpu and Zink, Anna and Hartvigsen, Tom and Agrawal, Monica and Finlayson, Sam and Glicksberg, Benjamin and Beaulieu-Jones, Brett and Wang, Kai and Fontalvo, Daseyra and Sarker, Tasmie and Chen, Irene and Alsentzer, Emily}, volume = {333}, series = {Proceedings of Machine Learning Research}, month = {29--30 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v333/main/assets/azhar26a/azhar26a.pdf}, url = {https://proceedings.mlr.press/v333/azhar26a.html}, abstract = {Spina Bifida (SB) is a complex neural tube defect that presents multifaceted healthcare challenges requiring multidisciplinary management. While advances in foundation models (FMs) offer promising avenues for enhancing SB care through intelligent, context-aware support, existing models struggle to accurately identify and reason about SB’s diverse symptoms. This study benchmarks eight widely used large language models (LLMs) through qualitative and quantitative evaluations, focusing on their ability to address the unique medical challenges of SB. This study presents an \textit{inverse prompting} technique aimed at guiding LLMs through a step-by-step diagnostic process. By incorporating a predefined set of symptoms relevant to SB, this approach prevents premature conclusions and enhances diagnostic reasoning, starting to address the Problem of Inclusion-Exclusion (PIE) as formulated in this study. Our evaluations reveal significant limitations in the LLMs’ abilities to accurately diagnose SB-related conditions, underscoring the need for specialized approaches. Building on these findings, this study proposes a novel framework that integrates a structured, symptom-based knowledge base specific to SB, enhancing the models’ contextual understanding and reasoning capabilities. This work highlights the potential of tailored AI solutions in improving access to care for individuals with SB, particularly in populations where gaps in knowledgeable providers persist. By addressing the shortcomings of general-purpose LLMs, our suggested framework aims to streamline SB care and improve patient outcomes, paving the way for more effective AI-assisted healthcare interventions in complex chronic conditions.} }
Endnote
%0 Conference Paper %T Toward Improving Diagnostic Reasoning for Spina Bifida Care: Benchmarking LLM–Patient Interactions %A Asfandyar Azhar %A Shaurjya Mandal %A Zaid Khan %A Nidhish Shah %A Curtis Langlotz %A Brad Dicianno %B Proceedings of the 7th Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2026 %E Elizabeth Healey %E Jason Fries %E Tom Pollard %E Shengpu Tang %E Anna Zink %E Tom Hartvigsen %E Monica Agrawal %E Sam Finlayson %E Benjamin Glicksberg %E Brett Beaulieu-Jones %E Kai Wang %E Daseyra Fontalvo %E Tasmie Sarker %E Irene Chen %E Emily Alsentzer %F pmlr-v333-azhar26a %I PMLR %P 552--571 %U https://proceedings.mlr.press/v333/azhar26a.html %V 333 %X Spina Bifida (SB) is a complex neural tube defect that presents multifaceted healthcare challenges requiring multidisciplinary management. While advances in foundation models (FMs) offer promising avenues for enhancing SB care through intelligent, context-aware support, existing models struggle to accurately identify and reason about SB’s diverse symptoms. This study benchmarks eight widely used large language models (LLMs) through qualitative and quantitative evaluations, focusing on their ability to address the unique medical challenges of SB. This study presents an \textit{inverse prompting} technique aimed at guiding LLMs through a step-by-step diagnostic process. By incorporating a predefined set of symptoms relevant to SB, this approach prevents premature conclusions and enhances diagnostic reasoning, starting to address the Problem of Inclusion-Exclusion (PIE) as formulated in this study. Our evaluations reveal significant limitations in the LLMs’ abilities to accurately diagnose SB-related conditions, underscoring the need for specialized approaches. Building on these findings, this study proposes a novel framework that integrates a structured, symptom-based knowledge base specific to SB, enhancing the models’ contextual understanding and reasoning capabilities. This work highlights the potential of tailored AI solutions in improving access to care for individuals with SB, particularly in populations where gaps in knowledgeable providers persist. By addressing the shortcomings of general-purpose LLMs, our suggested framework aims to streamline SB care and improve patient outcomes, paving the way for more effective AI-assisted healthcare interventions in complex chronic conditions.
APA
Azhar, A., Mandal, S., Khan, Z., Shah, N., Langlotz, C. & Dicianno, B.. (2026). Toward Improving Diagnostic Reasoning for Spina Bifida Care: Benchmarking LLM–Patient Interactions. Proceedings of the 7th Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 333:552-571 Available from https://proceedings.mlr.press/v333/azhar26a.html.

Related Material