Cluster-Aware Retrieval-Augmented Generation with Hybrid Retrieval for Faithful Medical Report Summarization

Kiarash Torabizadeh; Rachid Hedjam; Mebarka Allaoui; Bessam Abdulrazak

Cluster-Aware Retrieval-Augmented Generation with Hybrid Retrieval for Faithful Medical Report Summarization

Kiarash Torabizadeh, Rachid Hedjam, Mebarka Allaoui, Bessam Abdulrazak

Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1162-1168, 2026.

Abstract

Large Language Models (LLMs) can generate fluent medical summaries but may hallucinate facts not supported by source clinical text, limiting safe clinical adoption. In our prior work, we improved relevance through embedding-based patient clustering and cluster-wise GPT-4.0 summarization; however, summaries could still include unsupported claims due to a lack of explicit evidence grounding. This paper extends that pipeline with a cluster-aware Retrieval-Augmented Generation (RAG) layer to ground summaries in retrieved evidence. For each cluster, we construct two evidence artifacts: (i) a Cluster Profile aggregating clinical statistics (e.g., means, ranges, abnormality rates), and (ii) a Snippet Bank of patient report excerpts. Evidence is retrieved via a hybrid retriever that combines TF-IDF and dense-embedding similarity with weighted scoring. We enforce citation-constrained prompting, requiring each major claim to cite retrieved evidence or be marked as “insufficient evidence”. We evaluate cluster-wise RAG summaries using metrics for faithfulness (supported-claim rate), completeness (coverage of key abnormal indicators), and safety and overreach (diagnostic, medication, and absolute claims). Experiments on a synthetic hypertension dataset (150 patients stratified into low-, average-, and high-risk) show that our approach reduces hallucinations while preserving the personalization benefits of clustering.

Cite this Paper

BibTeX

@InProceedings{pmlr-v318-torabizadeh26a,
  title = 	 {Cluster-Aware Retrieval-Augmented Generation with Hybrid Retrieval for Faithful Medical Report Summarization},
  author =       {Torabizadeh, Kiarash and Hedjam, Rachid and Allaoui, Mebarka and Abdulrazak, Bessam},
  booktitle = 	 {Proceedings of the The 39th Canadian Conference on Artificial Intelligence},
  pages = 	 {1162--1168},
  year = 	 {2026},
  editor = 	 {Bouzar-Benlabiod, Lydia and Leung, Carson},
  volume = 	 {318},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--29 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v318/main/assets/torabizadeh26a/torabizadeh26a.pdf},
  url = 	 {https://proceedings.mlr.press/v318/torabizadeh26a.html},
  abstract = 	 {Large Language Models (LLMs) can generate fluent medical summaries but may hallucinate facts not supported by source clinical text, limiting safe clinical adoption. In our prior work, we improved relevance through embedding-based patient clustering and cluster-wise GPT-4.0 summarization; however, summaries could still include unsupported claims due to a lack of explicit evidence grounding. This paper extends that pipeline with a cluster-aware Retrieval-Augmented Generation (RAG) layer to ground summaries in retrieved evidence. For each cluster, we construct two evidence artifacts: (i) a Cluster Profile aggregating clinical statistics (e.g., means, ranges, abnormality rates), and (ii) a Snippet Bank of patient report excerpts. Evidence is retrieved via a hybrid retriever that combines TF-IDF and dense-embedding similarity with weighted scoring. We enforce citation-constrained prompting, requiring each major claim to cite retrieved evidence or be marked as “insufficient evidence”. We evaluate cluster-wise RAG summaries using metrics for faithfulness (supported-claim rate), completeness (coverage of key abnormal indicators), and safety and overreach (diagnostic, medication, and absolute claims). Experiments on a synthetic hypertension dataset (150 patients stratified into low-, average-, and high-risk) show that our approach reduces hallucinations while preserving the personalization benefits of clustering.}
}

Endnote

%0 Conference Paper
%T Cluster-Aware Retrieval-Augmented Generation with Hybrid Retrieval for Faithful Medical Report Summarization
%A Kiarash Torabizadeh
%A Rachid Hedjam
%A Mebarka Allaoui
%A Bessam Abdulrazak
%B Proceedings of the The 39th Canadian Conference on Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2026
%E Lydia Bouzar-Benlabiod
%E Carson Leung	
%F pmlr-v318-torabizadeh26a
%I PMLR
%P 1162--1168
%U https://proceedings.mlr.press/v318/torabizadeh26a.html
%V 318
%X Large Language Models (LLMs) can generate fluent medical summaries but may hallucinate facts not supported by source clinical text, limiting safe clinical adoption. In our prior work, we improved relevance through embedding-based patient clustering and cluster-wise GPT-4.0 summarization; however, summaries could still include unsupported claims due to a lack of explicit evidence grounding. This paper extends that pipeline with a cluster-aware Retrieval-Augmented Generation (RAG) layer to ground summaries in retrieved evidence. For each cluster, we construct two evidence artifacts: (i) a Cluster Profile aggregating clinical statistics (e.g., means, ranges, abnormality rates), and (ii) a Snippet Bank of patient report excerpts. Evidence is retrieved via a hybrid retriever that combines TF-IDF and dense-embedding similarity with weighted scoring. We enforce citation-constrained prompting, requiring each major claim to cite retrieved evidence or be marked as “insufficient evidence”. We evaluate cluster-wise RAG summaries using metrics for faithfulness (supported-claim rate), completeness (coverage of key abnormal indicators), and safety and overreach (diagnostic, medication, and absolute claims). Experiments on a synthetic hypertension dataset (150 patients stratified into low-, average-, and high-risk) show that our approach reduces hallucinations while preserving the personalization benefits of clustering.

APA

Torabizadeh, K., Hedjam, R., Allaoui, M. & Abdulrazak, B.. (2026). Cluster-Aware Retrieval-Augmented Generation with Hybrid Retrieval for Faithful Medical Report Summarization. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:1162-1168 Available from https://proceedings.mlr.press/v318/torabizadeh26a.html.

Related Material

Download PDF