PhenoRAG: Retrieval-Augmented Generation for Efficient Zero-Shot Phenotype Identification in Clinical Reports

Marc Berndt, Andrea Agostini, Beatrice Stocker, Maria Padrutt, Silvio Daniel Brugger, D Sean Froese, Daphné Chopard, Julia E Vogt
Proceedings of the 10th Machine Learning for Healthcare Conference, PMLR 298, 2025.

Abstract

Accurate extraction of phenotypic information from clinical narratives is essential in diagnostic medicine, yet mapping free-text reports to structured Human Phenotype Ontology (HPO) terms remains challenging. While encoder-only transformer models and small decoder-only generative models are attractive for clinical deployment due to their efficiency and low resource requirements, the former often fail to capture the rich context of clinical texts, and the latter struggle to process lengthy reports effectively. In contrast, larger language models excel at contextual understanding but are impractical for clinical use due to their size, propensity to hallucinate, and privacy concerns associated with non-local inference. To overcome these challenges, we introduce PhenoRAG, a novel retrieval-augmented generation framework that leverages a synthetic database of contextually enriched sentences to augment a lightweight decoder-only model for accurate zero-shot phenotype identification. We demonstrate the capacity of PhenoRAG to capture nuanced contextual clues by 1) evaluating its ability to perform two clinically relevant tasks—guide rare disease diagnosis and facilitate urinary tract infection detection—and 2) validating its performance on a synthetic dataset designed to mimic the challenges of real clinical narratives. Experimental results demonstrate that our lightweight PhenoRAG framework achieves a higher F1-score than both encoder-only transformers and standalone small language models, driven primarily by its high recall. These findings underscore the potential of PhenoRAG as a ready-to-use clinical tool for phenotype identification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v298-berndt25a, title = {Pheno{RAG}: Retrieval-Augmented Generation for Efficient Zero-Shot Phenotype Identification in Clinical Reports}, author = {Berndt, Marc and Agostini, Andrea and Stocker, Beatrice and Padrutt, Maria and Brugger, Silvio Daniel and Froese, D Sean and Chopard, Daphn\'e and Vogt, Julia E}, booktitle = {Proceedings of the 10th Machine Learning for Healthcare Conference}, year = {2025}, editor = {Agrawal, Monica and Deshpande, Kaivalya and Engelhard, Matthew and Joshi, Shalmali and Tang, Shengpu and Urteaga, Iñigo}, volume = {298}, series = {Proceedings of Machine Learning Research}, month = {15--16 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v298/main/assets/berndt25a/berndt25a.pdf}, url = {https://proceedings.mlr.press/v298/berndt25a.html}, abstract = {Accurate extraction of phenotypic information from clinical narratives is essential in diagnostic medicine, yet mapping free-text reports to structured Human Phenotype Ontology (HPO) terms remains challenging. While encoder-only transformer models and small decoder-only generative models are attractive for clinical deployment due to their efficiency and low resource requirements, the former often fail to capture the rich context of clinical texts, and the latter struggle to process lengthy reports effectively. In contrast, larger language models excel at contextual understanding but are impractical for clinical use due to their size, propensity to hallucinate, and privacy concerns associated with non-local inference. To overcome these challenges, we introduce PhenoRAG, a novel retrieval-augmented generation framework that leverages a synthetic database of contextually enriched sentences to augment a lightweight decoder-only model for accurate zero-shot phenotype identification. We demonstrate the capacity of PhenoRAG to capture nuanced contextual clues by 1) evaluating its ability to perform two clinically relevant tasks—guide rare disease diagnosis and facilitate urinary tract infection detection—and 2) validating its performance on a synthetic dataset designed to mimic the challenges of real clinical narratives. Experimental results demonstrate that our lightweight PhenoRAG framework achieves a higher F1-score than both encoder-only transformers and standalone small language models, driven primarily by its high recall. These findings underscore the potential of PhenoRAG as a ready-to-use clinical tool for phenotype identification.} }
Endnote
%0 Conference Paper %T PhenoRAG: Retrieval-Augmented Generation for Efficient Zero-Shot Phenotype Identification in Clinical Reports %A Marc Berndt %A Andrea Agostini %A Beatrice Stocker %A Maria Padrutt %A Silvio Daniel Brugger %A D Sean Froese %A Daphné Chopard %A Julia E Vogt %B Proceedings of the 10th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2025 %E Monica Agrawal %E Kaivalya Deshpande %E Matthew Engelhard %E Shalmali Joshi %E Shengpu Tang %E Iñigo Urteaga %F pmlr-v298-berndt25a %I PMLR %U https://proceedings.mlr.press/v298/berndt25a.html %V 298 %X Accurate extraction of phenotypic information from clinical narratives is essential in diagnostic medicine, yet mapping free-text reports to structured Human Phenotype Ontology (HPO) terms remains challenging. While encoder-only transformer models and small decoder-only generative models are attractive for clinical deployment due to their efficiency and low resource requirements, the former often fail to capture the rich context of clinical texts, and the latter struggle to process lengthy reports effectively. In contrast, larger language models excel at contextual understanding but are impractical for clinical use due to their size, propensity to hallucinate, and privacy concerns associated with non-local inference. To overcome these challenges, we introduce PhenoRAG, a novel retrieval-augmented generation framework that leverages a synthetic database of contextually enriched sentences to augment a lightweight decoder-only model for accurate zero-shot phenotype identification. We demonstrate the capacity of PhenoRAG to capture nuanced contextual clues by 1) evaluating its ability to perform two clinically relevant tasks—guide rare disease diagnosis and facilitate urinary tract infection detection—and 2) validating its performance on a synthetic dataset designed to mimic the challenges of real clinical narratives. Experimental results demonstrate that our lightweight PhenoRAG framework achieves a higher F1-score than both encoder-only transformers and standalone small language models, driven primarily by its high recall. These findings underscore the potential of PhenoRAG as a ready-to-use clinical tool for phenotype identification.
APA
Berndt, M., Agostini, A., Stocker, B., Padrutt, M., Brugger, S.D., Froese, D.S., Chopard, D. & Vogt, J.E.. (2025). PhenoRAG: Retrieval-Augmented Generation for Efficient Zero-Shot Phenotype Identification in Clinical Reports. Proceedings of the 10th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 298 Available from https://proceedings.mlr.press/v298/berndt25a.html.

Related Material