KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings

Ahmed Elhussein, Paul Meddeb, Abigail Newbury, Jeanne Mirone, Martin Stoll, Gamze Gursoy
Proceedings of the sixth Conference on Health, Inference, and Learning, PMLR 287:43-62, 2025.

Abstract

Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade-off: knowledge graph-based approaches capture formal relationships but miss real-world patterns, while data-driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge-preserving and Empirically-refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Evaluations on structured EHR from UK Biobank demonstrate that KEEP outperforms both traditional and LLM-based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP’s minimal computational requirements make it particularly suitable for resource-constrained environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v287-elhussein25a, title = {KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings}, author = {Elhussein, Ahmed and Meddeb, Paul and Newbury, Abigail and Mirone, Jeanne and Stoll, Martin and Gursoy, Gamze}, booktitle = {Proceedings of the sixth Conference on Health, Inference, and Learning}, pages = {43--62}, year = {2025}, editor = {Xu, Xuhai Orson and Choi, Edward and Singhal, Pankhuri and Gerych, Walter and Tang, Shengpu and Agrawal, Monica and Subbaswamy, Adarsh and Sizikova, Elena and Dunn, Jessilyn and Daneshjou, Roxana and Sarker, Tasmie and McDermott, Matthew and Chen, Irene}, volume = {287}, series = {Proceedings of Machine Learning Research}, month = {25--27 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v287/main/assets/elhussein25a/elhussein25a.pdf}, url = {https://proceedings.mlr.press/v287/elhussein25a.html}, abstract = {Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade-off: knowledge graph-based approaches capture formal relationships but miss real-world patterns, while data-driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge-preserving and Empirically-refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Evaluations on structured EHR from UK Biobank demonstrate that KEEP outperforms both traditional and LLM-based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP’s minimal computational requirements make it particularly suitable for resource-constrained environments.} }
Endnote
%0 Conference Paper %T KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings %A Ahmed Elhussein %A Paul Meddeb %A Abigail Newbury %A Jeanne Mirone %A Martin Stoll %A Gamze Gursoy %B Proceedings of the sixth Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2025 %E Xuhai Orson Xu %E Edward Choi %E Pankhuri Singhal %E Walter Gerych %E Shengpu Tang %E Monica Agrawal %E Adarsh Subbaswamy %E Elena Sizikova %E Jessilyn Dunn %E Roxana Daneshjou %E Tasmie Sarker %E Matthew McDermott %E Irene Chen %F pmlr-v287-elhussein25a %I PMLR %P 43--62 %U https://proceedings.mlr.press/v287/elhussein25a.html %V 287 %X Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade-off: knowledge graph-based approaches capture formal relationships but miss real-world patterns, while data-driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge-preserving and Empirically-refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Evaluations on structured EHR from UK Biobank demonstrate that KEEP outperforms both traditional and LLM-based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP’s minimal computational requirements make it particularly suitable for resource-constrained environments.
APA
Elhussein, A., Meddeb, P., Newbury, A., Mirone, J., Stoll, M. & Gursoy, G.. (2025). KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings. Proceedings of the sixth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 287:43-62 Available from https://proceedings.mlr.press/v287/elhussein25a.html.

Related Material