CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning

Tsai-Ning Wang; Lin-Lin Chen; Neil Zeghidour; Aaqib Saeed

CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning

Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed

Proceedings of the sixth Conference on Health, Inference, and Learning, PMLR 287:231-246, 2025.

Abstract

Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves $86.2%$ accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of $56.9%$ on unseen datasets. These findings highlight the transformative potential of integrating audio analysis with language-based reasoning to address key challenges in medical diagnostics, opening new possibilities for scalable, data-efficient AI systems capable of supporting real-world clinical decision-making.

Cite this Paper

BibTeX

@InProceedings{pmlr-v287-wang25b,
  title = 	 {CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning},
  author =       {Wang, Tsai-Ning and Chen, Lin-Lin and Zeghidour, Neil and Saeed, Aaqib},
  booktitle = 	 {Proceedings of the sixth Conference on Health, Inference, and Learning},
  pages = 	 {231--246},
  year = 	 {2025},
  editor = 	 {Xu, Xuhai Orson and Choi, Edward and Singhal, Pankhuri and Gerych, Walter and Tang, Shengpu and Agrawal, Monica and Subbaswamy, Adarsh and Sizikova, Elena and Dunn, Jessilyn and Daneshjou, Roxana and Sarker, Tasmie and McDermott, Matthew and Chen, Irene},
  volume = 	 {287},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v287/main/assets/wang25b/wang25b.pdf},
  url = 	 {https://proceedings.mlr.press/v287/wang25b.html},
  abstract = 	 {Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves $86.2%$ accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of $56.9%$ on unseen datasets. These findings highlight the transformative potential of integrating audio analysis with language-based reasoning to address key challenges in medical diagnostics, opening new possibilities for scalable, data-efficient AI systems capable of supporting real-world clinical decision-making.}
}

Endnote

%0 Conference Paper
%T CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
%A Tsai-Ning Wang
%A Lin-Lin Chen
%A Neil Zeghidour
%A Aaqib Saeed
%B Proceedings of the sixth Conference on Health, Inference, and Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Xuhai Orson Xu
%E Edward Choi
%E Pankhuri Singhal
%E Walter Gerych
%E Shengpu Tang
%E Monica Agrawal
%E Adarsh Subbaswamy
%E Elena Sizikova
%E Jessilyn Dunn
%E Roxana Daneshjou
%E Tasmie Sarker
%E Matthew McDermott
%E Irene Chen	
%F pmlr-v287-wang25b
%I PMLR
%P 231--246
%U https://proceedings.mlr.press/v287/wang25b.html
%V 287
%X Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves $86.2%$ accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of $56.9%$ on unseen datasets. These findings highlight the transformative potential of integrating audio analysis with language-based reasoning to address key challenges in medical diagnostics, opening new possibilities for scalable, data-efficient AI systems capable of supporting real-world clinical decision-making.

APA

Wang, T., Chen, L., Zeghidour, N. & Saeed, A.. (2025). CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning. Proceedings of the sixth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 287:231-246 Available from https://proceedings.mlr.press/v287/wang25b.html.

Related Material

Download PDF