NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text

Prajwal Kailas, Max Homilius, Rahul C. Deo, Calum A. MacRae
Proceedings of the 3rd Machine Learning for Health Symposium, PMLR 225:201-216, 2023.

Abstract

Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patient’s status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v225-kailas23a, title = {NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text}, author = {Kailas, Prajwal and Homilius, Max and Deo, Rahul C. and MacRae, Calum A.}, booktitle = {Proceedings of the 3rd Machine Learning for Health Symposium}, pages = {201--216}, year = {2023}, editor = {Hegselmann, Stefan and Parziale, Antonio and Shanmugam, Divya and Tang, Shengpu and Asiedu, Mercy Nyamewaa and Chang, Serina and Hartvigsen, Tom and Singh, Harvineet}, volume = {225}, series = {Proceedings of Machine Learning Research}, month = {10 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v225/kailas23a/kailas23a.pdf}, url = {https://proceedings.mlr.press/v225/kailas23a.html}, abstract = {Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patient’s status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks.} }
Endnote
%0 Conference Paper %T NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text %A Prajwal Kailas %A Max Homilius %A Rahul C. Deo %A Calum A. MacRae %B Proceedings of the 3rd Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2023 %E Stefan Hegselmann %E Antonio Parziale %E Divya Shanmugam %E Shengpu Tang %E Mercy Nyamewaa Asiedu %E Serina Chang %E Tom Hartvigsen %E Harvineet Singh %F pmlr-v225-kailas23a %I PMLR %P 201--216 %U https://proceedings.mlr.press/v225/kailas23a.html %V 225 %X Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patient’s status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks.
APA
Kailas, P., Homilius, M., Deo, R.C. & MacRae, C.A.. (2023). NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text. Proceedings of the 3rd Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 225:201-216 Available from https://proceedings.mlr.press/v225/kailas23a.html.

Related Material