DK-BEHRT: Teaching Language Models International Classification of Disease (ICD) Codes using Known Disease Descriptions

Ulzee An; Simon A. Lee; Moonseong Jeong; Aditya Gorla; Jeffrey N. Chiang; Sriram Sankararaman

DK-BEHRT: Teaching Language Models International Classification of Disease (ICD) Codes using Known Disease Descriptions

Ulzee An, Simon A. Lee, Moonseong Jeong, Aditya Gorla, Jeffrey N. Chiang, Sriram Sankararaman

Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 281:133-143, 2025.

Abstract

The widespread digitization of healthcare and patient data has created new opportunities to explore machine learning techniques for improving patient care. The sheer scale of this data has particularly motivated the use of deep learning methods like BERT, which can learn robust representations of medical concepts from patient data without the need for direct supervision. Simultaneously, recent research has shown that language models (LMs) trained on scientific literature can capture strong domain-specific knowledge, including concepts highly relevant to healthcare. In this paper, we leverage two complementary sources of information—patient medical records and descriptive clinical text—to learn complex clinical concepts, such as diagnostic codes, more effectively. Although significant strides have been made in using language models with each data type individually, few studies have explored whether the domain expertise acquired from scientific text can provide a beneficial inductive bias when applied to learning from patient records. To address this gap, we propose the Domain Knowledge BEHRT (DK-BEHRT), a model that integrates disease description embeddings from domain-specific language models, like BioGPT, into the attention mechanisms of a BERT-based architecture. By incorporating these “knowledge” embeddings, we aim to enhance the model’s ability to understand the clinical concept (e.g. ICD Codes) more effectively and predict clinical outcomes with higher accuracy. We validate this approach on the MIMIC-IV dataset and find that incorporating specialized embeddings consistently improves predictive accuracy for clinical outcomes compared to using generic embeddings or training the base model from scratch.

Cite this Paper

BibTeX

@InProceedings{pmlr-v281-an25a,
  title = 	 {DK-BEHRT: Teaching Language Models International Classification of Disease (ICD) Codes using Known Disease Descriptions},
  author =       {An, Ulzee and Lee, Simon A. and Jeong, Moonseong and Gorla, Aditya and Chiang, Jeffrey N. and Sankararaman, Sriram},
  booktitle = 	 {Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare},
  pages = 	 {133--143},
  year = 	 {2025},
  editor = 	 {Wu, Junde and Zhu, Jiayuan and Xu, Min and Jin, Yueming},
  volume = 	 {281},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v281/main/assets/an25a/an25a.pdf},
  url = 	 {https://proceedings.mlr.press/v281/an25a.html},
  abstract = 	 {The widespread digitization of healthcare and patient data has created new opportunities to explore machine learning techniques for improving patient care. The sheer scale of this data has particularly motivated the use of deep learning methods like BERT, which can learn robust representations of medical concepts from patient data without the need for direct supervision. Simultaneously, recent research has shown that language models (LMs) trained on scientific literature can capture strong domain-specific knowledge, including concepts highly relevant to healthcare. In this paper, we leverage two complementary sources of information—patient medical records and descriptive clinical text—to learn complex clinical concepts, such as diagnostic codes, more effectively. Although significant strides have been made in using language models with each data type individually, few studies have explored whether the domain expertise acquired from scientific text can provide a beneficial inductive bias when applied to learning from patient records. To address this gap, we propose the Domain Knowledge BEHRT (DK-BEHRT), a model that integrates disease description embeddings from domain-specific language models, like BioGPT, into the attention mechanisms of a BERT-based architecture. By incorporating these “knowledge” embeddings, we aim to enhance the model’s ability to understand the clinical concept (e.g. ICD Codes) more effectively and predict clinical outcomes with higher accuracy. We validate this approach on the MIMIC-IV dataset and find that incorporating specialized embeddings consistently improves predictive accuracy for clinical outcomes compared to using generic embeddings or training the base model from scratch.}
}

Endnote

%0 Conference Paper
%T DK-BEHRT: Teaching Language Models International Classification of Disease (ICD) Codes using Known Disease Descriptions
%A Ulzee An
%A Simon A. Lee
%A Moonseong Jeong
%A Aditya Gorla
%A Jeffrey N. Chiang
%A Sriram Sankararaman
%B Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare
%C Proceedings of Machine Learning Research
%D 2025
%E Junde Wu
%E Jiayuan Zhu
%E Min Xu
%E Yueming Jin	
%F pmlr-v281-an25a
%I PMLR
%P 133--143
%U https://proceedings.mlr.press/v281/an25a.html
%V 281
%X The widespread digitization of healthcare and patient data has created new opportunities to explore machine learning techniques for improving patient care. The sheer scale of this data has particularly motivated the use of deep learning methods like BERT, which can learn robust representations of medical concepts from patient data without the need for direct supervision. Simultaneously, recent research has shown that language models (LMs) trained on scientific literature can capture strong domain-specific knowledge, including concepts highly relevant to healthcare. In this paper, we leverage two complementary sources of information—patient medical records and descriptive clinical text—to learn complex clinical concepts, such as diagnostic codes, more effectively. Although significant strides have been made in using language models with each data type individually, few studies have explored whether the domain expertise acquired from scientific text can provide a beneficial inductive bias when applied to learning from patient records. To address this gap, we propose the Domain Knowledge BEHRT (DK-BEHRT), a model that integrates disease description embeddings from domain-specific language models, like BioGPT, into the attention mechanisms of a BERT-based architecture. By incorporating these “knowledge” embeddings, we aim to enhance the model’s ability to understand the clinical concept (e.g. ICD Codes) more effectively and predict clinical outcomes with higher accuracy. We validate this approach on the MIMIC-IV dataset and find that incorporating specialized embeddings consistently improves predictive accuracy for clinical outcomes compared to using generic embeddings or training the base model from scratch.

APA

An, U., Lee, S.A., Jeong, M., Gorla, A., Chiang, J.N. & Sankararaman, S.. (2025). DK-BEHRT: Teaching Language Models International Classification of Disease (ICD) Codes using Known Disease Descriptions. Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare, in Proceedings of Machine Learning Research 281:133-143 Available from https://proceedings.mlr.press/v281/an25a.html.

Related Material

Download PDF