Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records

Mosbah Aouad, Anirudh Choudhary, Awais Farooq, Steven W Nevers, Lusine Demirkhanyan, Bhrandon Harris, Suguna Pappu, Christopher S Gondi, Ravi Iyer
Proceedings of the 10th Machine Learning for Healthcare Conference, PMLR 298, 2025.

Abstract

Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, and early detection remains a major clinical challenge due to the absence of specific symptoms and reliable biomarkers. In this work, we propose a new multimodal approach that integrates longitudinal diagnosis code histories and routinely collected laboratory measurements from electronic health records to detect PDAC up to one year prior to clinical diagnosis. Our method combines Neural Controlled Differential Equations to model irregular lab time series, pretrained language models and recurrent networks to learn diagnosis code trajectory representations, and cross-attention mechanisms to capture interactions between the two modalities. We develop and evaluate our approach on a real-world dataset of nearly 4,700 patients and achieve significant improvements in AUC ranging from 6.5% to 15.5% over state-of-the-art methods. Furthermore, our model identifies diagnosis codes and laboratory panels associated with elevated PDAC risk, including both established and new biomarkers.

Cite this Paper


BibTeX
@InProceedings{pmlr-v298-aouad25a, title = {Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records}, author = {Aouad, Mosbah and Choudhary, Anirudh and Farooq, Awais and Nevers, Steven W and Demirkhanyan, Lusine and Harris, Bhrandon and Pappu, Suguna and Gondi, Christopher S and Iyer, Ravi}, booktitle = {Proceedings of the 10th Machine Learning for Healthcare Conference}, year = {2025}, editor = {Agrawal, Monica and Deshpande, Kaivalya and Engelhard, Matthew and Joshi, Shalmali and Tang, Shengpu and Urteaga, Iñigo}, volume = {298}, series = {Proceedings of Machine Learning Research}, month = {15--16 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v298/main/assets/aouad25a/aouad25a.pdf}, url = {https://proceedings.mlr.press/v298/aouad25a.html}, abstract = {Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, and early detection remains a major clinical challenge due to the absence of specific symptoms and reliable biomarkers. In this work, we propose a new multimodal approach that integrates longitudinal diagnosis code histories and routinely collected laboratory measurements from electronic health records to detect PDAC up to one year prior to clinical diagnosis. Our method combines Neural Controlled Differential Equations to model irregular lab time series, pretrained language models and recurrent networks to learn diagnosis code trajectory representations, and cross-attention mechanisms to capture interactions between the two modalities. We develop and evaluate our approach on a real-world dataset of nearly 4,700 patients and achieve significant improvements in AUC ranging from 6.5% to 15.5% over state-of-the-art methods. Furthermore, our model identifies diagnosis codes and laboratory panels associated with elevated PDAC risk, including both established and new biomarkers.} }
Endnote
%0 Conference Paper %T Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records %A Mosbah Aouad %A Anirudh Choudhary %A Awais Farooq %A Steven W Nevers %A Lusine Demirkhanyan %A Bhrandon Harris %A Suguna Pappu %A Christopher S Gondi %A Ravi Iyer %B Proceedings of the 10th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2025 %E Monica Agrawal %E Kaivalya Deshpande %E Matthew Engelhard %E Shalmali Joshi %E Shengpu Tang %E Iñigo Urteaga %F pmlr-v298-aouad25a %I PMLR %U https://proceedings.mlr.press/v298/aouad25a.html %V 298 %X Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, and early detection remains a major clinical challenge due to the absence of specific symptoms and reliable biomarkers. In this work, we propose a new multimodal approach that integrates longitudinal diagnosis code histories and routinely collected laboratory measurements from electronic health records to detect PDAC up to one year prior to clinical diagnosis. Our method combines Neural Controlled Differential Equations to model irregular lab time series, pretrained language models and recurrent networks to learn diagnosis code trajectory representations, and cross-attention mechanisms to capture interactions between the two modalities. We develop and evaluate our approach on a real-world dataset of nearly 4,700 patients and achieve significant improvements in AUC ranging from 6.5% to 15.5% over state-of-the-art methods. Furthermore, our model identifies diagnosis codes and laboratory panels associated with elevated PDAC risk, including both established and new biomarkers.
APA
Aouad, M., Choudhary, A., Farooq, A., Nevers, S.W., Demirkhanyan, L., Harris, B., Pappu, S., Gondi, C.S. & Iyer, R.. (2025). Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records. Proceedings of the 10th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 298 Available from https://proceedings.mlr.press/v298/aouad25a.html.

Related Material