AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval

Jingru Lin; Chen Zhang; Tianrui Wang; Haizhou Li

AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval

Jingru Lin, Chen Zhang, Tianrui Wang, Haizhou Li

Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:113-125, 2026.

Abstract

Due to recent advancements in Large Audio-Language Models (LALMs) that demonstrate remarkable performance across a range of sound-, speech- and music-related tasks, there is a growing interest in proposing benchmarks to assess these models. Existing benchmarks generally focus only on reasoning with internal knowledge, neglecting real-world scenarios that require external information grounding. To bridge this gap, we introduce AudioRag, a novel benchmark designed to evaluate audio-based reasoning augmented by information retrieval in realistic web environments. This benchmark comprises both LLM-generated and manually curated question-answer pairs. Our evaluations reveal that even the state-of-the-art LALMs struggle to answer these questions. We therefore propose an agentic pipeline that integrates audio reasoning with retrieval-augmented generation, providing a stronger baseline for future research.

Cite this Paper

BibTeX

@InProceedings{pmlr-v312-lin26a,
  title = 	 {{AudioRAG}: A Challenging Benchmark for Audio Reasoning and Information Retrieval},
  author =       {Lin, Jingru and Zhang, Chen and Wang, Tianrui and Li, Haizhou},
  booktitle = 	 {Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI)},
  pages = 	 {113--125},
  year = 	 {2026},
  editor = 	 {Komatsu, Tatsuya and Imoto, Keisuke and Gao, Xiaoxue and Ono, Nobutaka and Chen, Nancy F.},
  volume = 	 {312},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26 Jan},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v312/main/assets/lin26a/lin26a.pdf},
  url = 	 {https://proceedings.mlr.press/v312/lin26a.html},
  abstract = 	 {Due to recent advancements in Large Audio-Language Models (LALMs) that demonstrate remarkable performance across a range of sound-, speech- and music-related tasks, there is a growing interest in proposing benchmarks to assess these models. Existing benchmarks generally focus only on reasoning with internal knowledge, neglecting real-world scenarios that require external information grounding. To bridge this gap, we introduce AudioRag, a novel benchmark designed to evaluate audio-based reasoning augmented by information retrieval in realistic web environments. This benchmark comprises both LLM-generated and manually curated question-answer pairs. Our evaluations reveal that even the state-of-the-art LALMs struggle to answer these questions. We therefore propose an agentic pipeline that integrates audio reasoning with retrieval-augmented generation, providing a stronger baseline for future research.}
}

Endnote

%0 Conference Paper
%T AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval
%A Jingru Lin
%A Chen Zhang
%A Tianrui Wang
%A Haizhou Li
%B Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI)
%C Proceedings of Machine Learning Research
%D 2026
%E Tatsuya Komatsu
%E Keisuke Imoto
%E Xiaoxue Gao
%E Nobutaka Ono
%E Nancy F. Chen	
%F pmlr-v312-lin26a
%I PMLR
%P 113--125
%U https://proceedings.mlr.press/v312/lin26a.html
%V 312
%X Due to recent advancements in Large Audio-Language Models (LALMs) that demonstrate remarkable performance across a range of sound-, speech- and music-related tasks, there is a growing interest in proposing benchmarks to assess these models. Existing benchmarks generally focus only on reasoning with internal knowledge, neglecting real-world scenarios that require external information grounding. To bridge this gap, we introduce AudioRag, a novel benchmark designed to evaluate audio-based reasoning augmented by information retrieval in realistic web environments. This benchmark comprises both LLM-generated and manually curated question-answer pairs. Our evaluations reveal that even the state-of-the-art LALMs struggle to answer these questions. We therefore propose an agentic pipeline that integrates audio reasoning with retrieval-augmented generation, providing a stronger baseline for future research.

APA

Lin, J., Zhang, C., Wang, T. & Li, H.. (2026). AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval. Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), in Proceedings of Machine Learning Research 312:113-125 Available from https://proceedings.mlr.press/v312/lin26a.html.

Related Material

Download PDF