[edit]
Investigating RAG-based Approaches in Clinical Trial and Patient Matching
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:76-87, 2026.
Abstract
The task of matching clinical trials and patients involves predicting whether a patient meets the eligibility criteria of a clinical trial, via evidences from patient records, such as clinical notes. Given that both the trial eligibility criteria and the clinical notes of patients are unstructured texts, Large Language Models (LLMs) hold the potential to improve performance on this task. Nevertheless, LLMs come with their own challenges of transparency and accountability. Current methods use Retrieval-Augmented Generation (RAG) in order to predict patient eligibility. In this work, we systematically investigate three aspects of these RAG-based approaches: (i) the complexity of the task, (ii) data retrieval for longitudinal records, and (iii) the effect of abstention on prediction quality. We show that criteria complexity, model abstention and chunking longitudinal patient records have noticeable effects on model performance. We also show that the choice of embedding models and ranking methods has little effect on the evidences retrieved from patient history. We hope that the findings of our study encourage research in improving the transparency and accountability of RAG approaches in clinical decision-making tasks.