NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text

Prajwal Kailas; Max Homilius; Rahul C. Deo; Calum A. MacRae

NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text

Prajwal Kailas, Max Homilius, Rahul C. Deo, Calum A. MacRae

Proceedings of the 3rd Machine Learning for Health Symposium, PMLR 225:201-216, 2023.

Abstract

Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patient’s status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v225-kailas23a,
  title = 	 {NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text},
  author =       {Kailas, Prajwal and Homilius, Max and Deo, Rahul C. and MacRae, Calum A.},
  booktitle = 	 {Proceedings of the 3rd Machine Learning for Health Symposium},
  pages = 	 {201--216},
  year = 	 {2023},
  editor = 	 {Hegselmann, Stefan and Parziale, Antonio and Shanmugam, Divya and Tang, Shengpu and Asiedu, Mercy Nyamewaa and Chang, Serina and Hartvigsen, Tom and Singh, Harvineet},
  volume = 	 {225},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v225/kailas23a/kailas23a.pdf},
  url = 	 {https://proceedings.mlr.press/v225/kailas23a.html},
  abstract = 	 {Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patient’s status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks.}
}

Endnote

%0 Conference Paper
%T NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text
%A Prajwal Kailas
%A Max Homilius
%A Rahul C. Deo
%A Calum A. MacRae
%B Proceedings of the 3rd Machine Learning for Health Symposium
%C Proceedings of Machine Learning Research
%D 2023
%E Stefan Hegselmann
%E Antonio Parziale
%E Divya Shanmugam
%E Shengpu Tang
%E Mercy Nyamewaa Asiedu
%E Serina Chang
%E Tom Hartvigsen
%E Harvineet Singh	
%F pmlr-v225-kailas23a
%I PMLR
%P 201--216
%U https://proceedings.mlr.press/v225/kailas23a.html
%V 225
%X Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patient’s status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks.

APA


Kailas, P., Homilius, M., Deo, R.C. & MacRae, C.A.. (2023). NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text. Proceedings of the 3rd Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 225:201-216 Available from https://proceedings.mlr.press/v225/kailas23a.html.

NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text

Abstract

Cite this Paper

Related Material