Diagnosing and Repairing Factual Errors in RAG under Budget Constraints

Soroush Hashemifar, Havva Alizadeh Noughabi, Fattane Zarrinkalam, Ali Dehghantanha
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:924-931, 2026.

Abstract

Retrieval-Augmented Generation (RAG) improves the factuality of large language models by grounding responses in external evidence, yet real-world deployments remain fragile. Failures often stem from missing or weakly relevant evidence, as well as from generation that does not faithfully reflect the retrieved context. Many existing approaches rely on fine-tuning, privileged access to internal model signals, or resource-insensitive escalation strategies, which limits their practicality in black-box and budget-constrained settings. We propose D2R-RAG (Diagnose-to-Repair RAG), a model-agnostic and resource-aware framework that combines lightweight failure diagnosis with adaptive repair. D2R-RAG derives interpretable failure signatures from observable signals in the query, retrieved evidence, and generated response, and then selects from a small set of corrective actions under explicit latency and VRAM constraints. Experiments on FEVER and HotpotQA show that D2R-RAG improves reliability over recent baselines and achieves better accuracy–efficiency trade-offs across multiple compute budgets. The code is available at https://github.com/CyberScienceLab/D2R-RAG/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-hashemifar26a, title = {Diagnosing and Repairing Factual Errors in RAG under Budget Constraints}, author = {Hashemifar, Soroush and Noughabi, Havva Alizadeh and Zarrinkalam, Fattane and Dehghantanha, Ali}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {924--931}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/hashemifar26a/hashemifar26a.pdf}, url = {https://proceedings.mlr.press/v318/hashemifar26a.html}, abstract = {Retrieval-Augmented Generation (RAG) improves the factuality of large language models by grounding responses in external evidence, yet real-world deployments remain fragile. Failures often stem from missing or weakly relevant evidence, as well as from generation that does not faithfully reflect the retrieved context. Many existing approaches rely on fine-tuning, privileged access to internal model signals, or resource-insensitive escalation strategies, which limits their practicality in black-box and budget-constrained settings. We propose D2R-RAG (Diagnose-to-Repair RAG), a model-agnostic and resource-aware framework that combines lightweight failure diagnosis with adaptive repair. D2R-RAG derives interpretable failure signatures from observable signals in the query, retrieved evidence, and generated response, and then selects from a small set of corrective actions under explicit latency and VRAM constraints. Experiments on FEVER and HotpotQA show that D2R-RAG improves reliability over recent baselines and achieves better accuracy–efficiency trade-offs across multiple compute budgets. The code is available at https://github.com/CyberScienceLab/D2R-RAG/.} }
Endnote
%0 Conference Paper %T Diagnosing and Repairing Factual Errors in RAG under Budget Constraints %A Soroush Hashemifar %A Havva Alizadeh Noughabi %A Fattane Zarrinkalam %A Ali Dehghantanha %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-hashemifar26a %I PMLR %P 924--931 %U https://proceedings.mlr.press/v318/hashemifar26a.html %V 318 %X Retrieval-Augmented Generation (RAG) improves the factuality of large language models by grounding responses in external evidence, yet real-world deployments remain fragile. Failures often stem from missing or weakly relevant evidence, as well as from generation that does not faithfully reflect the retrieved context. Many existing approaches rely on fine-tuning, privileged access to internal model signals, or resource-insensitive escalation strategies, which limits their practicality in black-box and budget-constrained settings. We propose D2R-RAG (Diagnose-to-Repair RAG), a model-agnostic and resource-aware framework that combines lightweight failure diagnosis with adaptive repair. D2R-RAG derives interpretable failure signatures from observable signals in the query, retrieved evidence, and generated response, and then selects from a small set of corrective actions under explicit latency and VRAM constraints. Experiments on FEVER and HotpotQA show that D2R-RAG improves reliability over recent baselines and achieves better accuracy–efficiency trade-offs across multiple compute budgets. The code is available at https://github.com/CyberScienceLab/D2R-RAG/.
APA
Hashemifar, S., Noughabi, H.A., Zarrinkalam, F. & Dehghantanha, A.. (2026). Diagnosing and Repairing Factual Errors in RAG under Budget Constraints. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:924-931 Available from https://proceedings.mlr.press/v318/hashemifar26a.html.

Related Material