Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning

Rina Bao, Shilong Dong, Zhenfang Chen, Sheng He, Ellen Grant, Yangming Ou
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:2930-2952, 2025.

Abstract

Medical Visual Question Answering (MVQA) requires AI models to answer questions related to medical images, offering significant potential to assist medical professionals in evaluating and diagnosing diseases, thereby improving early interventions. However, existing MVQA datasets primarily focus on basic questions regarding visual perception and pattern recognition, without addressing the more complex questions that are critical in clinical diagnosis and decision-making. This paper introduces a new benchmark designed for professional-level medical reasoning, simulating the decision-making process. We achieve this by collecting MRI and clinical data related to Hypoxic-Ischemic Encephalopathy, enriched with expert annotations and insights. Building on this data, we generate clinical question-answer pairs and MRI interpretations to enable comprehensive diagnosis, interpretation, and prediction of neurocognitive outcomes. Our evaluation of current large vision-language models (LVLMs) shows limited performance on this benchmark, highlighting both the challenges of the task and the importance of this benchmark for advancing medical AI. Furthermore, we propose a novel “Clinical Graph of Thoughts" model, which integrates domain-specific medical knowledge and clinical reasoning processes with the interpretive abilities of LVLMs. The model demonstrates promising results, achieving around 15% absolute gain on the most important neurocognitive outcome task, while the benchmark still reveals substantial opportunities for further research innovation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-bao25b, title = {Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning}, author = {Bao, Rina and Dong, Shilong and Chen, Zhenfang and He, Sheng and Grant, Ellen and Ou, Yangming}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {2930--2952}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/bao25b/bao25b.pdf}, url = {https://proceedings.mlr.press/v267/bao25b.html}, abstract = {Medical Visual Question Answering (MVQA) requires AI models to answer questions related to medical images, offering significant potential to assist medical professionals in evaluating and diagnosing diseases, thereby improving early interventions. However, existing MVQA datasets primarily focus on basic questions regarding visual perception and pattern recognition, without addressing the more complex questions that are critical in clinical diagnosis and decision-making. This paper introduces a new benchmark designed for professional-level medical reasoning, simulating the decision-making process. We achieve this by collecting MRI and clinical data related to Hypoxic-Ischemic Encephalopathy, enriched with expert annotations and insights. Building on this data, we generate clinical question-answer pairs and MRI interpretations to enable comprehensive diagnosis, interpretation, and prediction of neurocognitive outcomes. Our evaluation of current large vision-language models (LVLMs) shows limited performance on this benchmark, highlighting both the challenges of the task and the importance of this benchmark for advancing medical AI. Furthermore, we propose a novel “Clinical Graph of Thoughts" model, which integrates domain-specific medical knowledge and clinical reasoning processes with the interpretive abilities of LVLMs. The model demonstrates promising results, achieving around 15% absolute gain on the most important neurocognitive outcome task, while the benchmark still reveals substantial opportunities for further research innovation.} }
Endnote
%0 Conference Paper %T Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning %A Rina Bao %A Shilong Dong %A Zhenfang Chen %A Sheng He %A Ellen Grant %A Yangming Ou %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-bao25b %I PMLR %P 2930--2952 %U https://proceedings.mlr.press/v267/bao25b.html %V 267 %X Medical Visual Question Answering (MVQA) requires AI models to answer questions related to medical images, offering significant potential to assist medical professionals in evaluating and diagnosing diseases, thereby improving early interventions. However, existing MVQA datasets primarily focus on basic questions regarding visual perception and pattern recognition, without addressing the more complex questions that are critical in clinical diagnosis and decision-making. This paper introduces a new benchmark designed for professional-level medical reasoning, simulating the decision-making process. We achieve this by collecting MRI and clinical data related to Hypoxic-Ischemic Encephalopathy, enriched with expert annotations and insights. Building on this data, we generate clinical question-answer pairs and MRI interpretations to enable comprehensive diagnosis, interpretation, and prediction of neurocognitive outcomes. Our evaluation of current large vision-language models (LVLMs) shows limited performance on this benchmark, highlighting both the challenges of the task and the importance of this benchmark for advancing medical AI. Furthermore, we propose a novel “Clinical Graph of Thoughts" model, which integrates domain-specific medical knowledge and clinical reasoning processes with the interpretive abilities of LVLMs. The model demonstrates promising results, achieving around 15% absolute gain on the most important neurocognitive outcome task, while the benchmark still reveals substantial opportunities for further research innovation.
APA
Bao, R., Dong, S., Chen, Z., He, S., Grant, E. & Ou, Y.. (2025). Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:2930-2952 Available from https://proceedings.mlr.press/v267/bao25b.html.

Related Material