Clinical Trial Recommendation with LLM-Based Query Generation and Graph-Based Pairwise Re-ranking

Mehrnaz Senobari Vayghan, Emad A. Mohammed, Behrouz H. Far
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1224-1229, 2026.

Abstract

Clinical trials are essential for drug development and advancing medical treatments. However, many fail due to the challenges of patient recruitment, as identifying suitable participants is both expensive and time-consuming. Recent advances in large language models have demonstrated strong potential in healthcare settings, offering a promising way to automate this process. In this study, we propose an LLM-based recommendation pipeline that suggests a ranked list of clinical trials based on patient characteristics. In the first stage, a medical LLM generates focused search queries from patient notes via chain-of-thought prompting. These queries are used to retrieve a candidate set from a large-scale clinical trial corpus via dense semantic search. In the second stage, candidates are then re-ranked via pairwise re-ranking with graph aggregation. We evaluate our pipeline on the TREC Clinical Trials 2021 and 2022 benchmarks. Query-generated retrieval, achieves significant improvements over raw retrieval, with Recall@1000 improving by 33.8% and 46.1% on TREC 2021 and 2022. Pairwise re-ranking with graph aggregation further improves nDCG@10 by 12.8% and 11.9%, P@10 by 11.2% and 10.1%,and MRRby18.2% and 12.3% on TREC 2021 and 2022 respectively. All results are obtained only by using an open-source 8B-parameter model, without task-specific fine-tuning or closed-source API dependence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-vayghan26a, title = {Clinical Trial Recommendation with LLM-Based Query Generation and Graph-Based Pairwise Re-ranking}, author = {Vayghan, Mehrnaz Senobari and Mohammed, Emad A. and Far, Behrouz H.}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {1224--1229}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/vayghan26a/vayghan26a.pdf}, url = {https://proceedings.mlr.press/v318/vayghan26a.html}, abstract = {Clinical trials are essential for drug development and advancing medical treatments. However, many fail due to the challenges of patient recruitment, as identifying suitable participants is both expensive and time-consuming. Recent advances in large language models have demonstrated strong potential in healthcare settings, offering a promising way to automate this process. In this study, we propose an LLM-based recommendation pipeline that suggests a ranked list of clinical trials based on patient characteristics. In the first stage, a medical LLM generates focused search queries from patient notes via chain-of-thought prompting. These queries are used to retrieve a candidate set from a large-scale clinical trial corpus via dense semantic search. In the second stage, candidates are then re-ranked via pairwise re-ranking with graph aggregation. We evaluate our pipeline on the TREC Clinical Trials 2021 and 2022 benchmarks. Query-generated retrieval, achieves significant improvements over raw retrieval, with Recall@1000 improving by 33.8% and 46.1% on TREC 2021 and 2022. Pairwise re-ranking with graph aggregation further improves nDCG@10 by 12.8% and 11.9%, P@10 by 11.2% and 10.1%,and MRRby18.2% and 12.3% on TREC 2021 and 2022 respectively. All results are obtained only by using an open-source 8B-parameter model, without task-specific fine-tuning or closed-source API dependence.} }
Endnote
%0 Conference Paper %T Clinical Trial Recommendation with LLM-Based Query Generation and Graph-Based Pairwise Re-ranking %A Mehrnaz Senobari Vayghan %A Emad A. Mohammed %A Behrouz H. Far %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-vayghan26a %I PMLR %P 1224--1229 %U https://proceedings.mlr.press/v318/vayghan26a.html %V 318 %X Clinical trials are essential for drug development and advancing medical treatments. However, many fail due to the challenges of patient recruitment, as identifying suitable participants is both expensive and time-consuming. Recent advances in large language models have demonstrated strong potential in healthcare settings, offering a promising way to automate this process. In this study, we propose an LLM-based recommendation pipeline that suggests a ranked list of clinical trials based on patient characteristics. In the first stage, a medical LLM generates focused search queries from patient notes via chain-of-thought prompting. These queries are used to retrieve a candidate set from a large-scale clinical trial corpus via dense semantic search. In the second stage, candidates are then re-ranked via pairwise re-ranking with graph aggregation. We evaluate our pipeline on the TREC Clinical Trials 2021 and 2022 benchmarks. Query-generated retrieval, achieves significant improvements over raw retrieval, with Recall@1000 improving by 33.8% and 46.1% on TREC 2021 and 2022. Pairwise re-ranking with graph aggregation further improves nDCG@10 by 12.8% and 11.9%, P@10 by 11.2% and 10.1%,and MRRby18.2% and 12.3% on TREC 2021 and 2022 respectively. All results are obtained only by using an open-source 8B-parameter model, without task-specific fine-tuning or closed-source API dependence.
APA
Vayghan, M.S., Mohammed, E.A. & Far, B.H.. (2026). Clinical Trial Recommendation with LLM-Based Query Generation and Graph-Based Pairwise Re-ranking. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:1224-1229 Available from https://proceedings.mlr.press/v318/vayghan26a.html.

Related Material