Offline Surgical QA with Decomposed Retrieval and Synthesis for Resource-Constrained Settings

Kiran Bhattacharyya
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:449-472, 2026.

Abstract

Digital access to critical medical knowledge in resource-limited settings is often hindered by a lack of internet connectivity and the computational demands of {AI} systems. This paper introduces the Surgical Information Assistant, a fully deployable, large language model ({LLM})-driven multi-agent system designed to provide reliable surgical information in offline, resource-constrained environments. Our system is powered by a workflow that orchestrates question decomposition, information retrieval, grounded generation, and information synthesis to perform complex reasoning on consumer-grade hardware. Grounded in the Open Manual of Surgery for Resource-Limited Settings, we evaluated DeRetSyn on a new question-answer ({QA}) dataset of over 14,000 surgical question-answer pairs. We compare our system to other alternatives, perform ablation experiments on components of the agentic system, and interrogate sensitivity to retrieval parameters. The results show that our agentic orchestration enables a compact 3B Llama model to achieve 63% top-1 accuracy, significantly outperforming both a baseline GPT-4o (42.5%) and a larger 8B Llama model with conventional {RAG} (53%). We further test whether this performance enhancement from agentic orchestration for information retrieval generalizes to the PubMedQA dataset. Additionally, the entire system consumes <3.5 GB of RAM and generates responses within 8–15 seconds working on a consumer laptop. Our work serves as a practical blueprint for how agent-based systems can empower small, efficient models for medical domain information retrieval and synthesis, offering a tangible application of {AI} technology that could help advance health equity. We will release our dataset, code base, and prompts to foster further research in deployable and responsible clinical {AI}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-bhattacharyya26a, title = {Offline Surgical QA with Decomposed Retrieval and Synthesis for Resource-Constrained Settings}, author = {Bhattacharyya, Kiran}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {449--472}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/bhattacharyya26a/bhattacharyya26a.pdf}, url = {https://proceedings.mlr.press/v297/bhattacharyya26a.html}, abstract = {Digital access to critical medical knowledge in resource-limited settings is often hindered by a lack of internet connectivity and the computational demands of {AI} systems. This paper introduces the Surgical Information Assistant, a fully deployable, large language model ({LLM})-driven multi-agent system designed to provide reliable surgical information in offline, resource-constrained environments. Our system is powered by a workflow that orchestrates question decomposition, information retrieval, grounded generation, and information synthesis to perform complex reasoning on consumer-grade hardware. Grounded in the Open Manual of Surgery for Resource-Limited Settings, we evaluated DeRetSyn on a new question-answer ({QA}) dataset of over 14,000 surgical question-answer pairs. We compare our system to other alternatives, perform ablation experiments on components of the agentic system, and interrogate sensitivity to retrieval parameters. The results show that our agentic orchestration enables a compact 3B Llama model to achieve 63% top-1 accuracy, significantly outperforming both a baseline GPT-4o (42.5%) and a larger 8B Llama model with conventional {RAG} (53%). We further test whether this performance enhancement from agentic orchestration for information retrieval generalizes to the PubMedQA dataset. Additionally, the entire system consumes <3.5 GB of RAM and generates responses within 8–15 seconds working on a consumer laptop. Our work serves as a practical blueprint for how agent-based systems can empower small, efficient models for medical domain information retrieval and synthesis, offering a tangible application of {AI} technology that could help advance health equity. We will release our dataset, code base, and prompts to foster further research in deployable and responsible clinical {AI}.} }
Endnote
%0 Conference Paper %T Offline Surgical QA with Decomposed Retrieval and Synthesis for Resource-Constrained Settings %A Kiran Bhattacharyya %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-bhattacharyya26a %I PMLR %P 449--472 %U https://proceedings.mlr.press/v297/bhattacharyya26a.html %V 297 %X Digital access to critical medical knowledge in resource-limited settings is often hindered by a lack of internet connectivity and the computational demands of {AI} systems. This paper introduces the Surgical Information Assistant, a fully deployable, large language model ({LLM})-driven multi-agent system designed to provide reliable surgical information in offline, resource-constrained environments. Our system is powered by a workflow that orchestrates question decomposition, information retrieval, grounded generation, and information synthesis to perform complex reasoning on consumer-grade hardware. Grounded in the Open Manual of Surgery for Resource-Limited Settings, we evaluated DeRetSyn on a new question-answer ({QA}) dataset of over 14,000 surgical question-answer pairs. We compare our system to other alternatives, perform ablation experiments on components of the agentic system, and interrogate sensitivity to retrieval parameters. The results show that our agentic orchestration enables a compact 3B Llama model to achieve 63% top-1 accuracy, significantly outperforming both a baseline GPT-4o (42.5%) and a larger 8B Llama model with conventional {RAG} (53%). We further test whether this performance enhancement from agentic orchestration for information retrieval generalizes to the PubMedQA dataset. Additionally, the entire system consumes <3.5 GB of RAM and generates responses within 8–15 seconds working on a consumer laptop. Our work serves as a practical blueprint for how agent-based systems can empower small, efficient models for medical domain information retrieval and synthesis, offering a tangible application of {AI} technology that could help advance health equity. We will release our dataset, code base, and prompts to foster further research in deployable and responsible clinical {AI}.
APA
Bhattacharyya, K.. (2026). Offline Surgical QA with Decomposed Retrieval and Synthesis for Resource-Constrained Settings. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:449-472 Available from https://proceedings.mlr.press/v297/bhattacharyya26a.html.

Related Material