Graph-Text Multi-Modal Pre-training for Medical Representation Learning

Sungjin Park, Seongsu Bae, Jiho Kim, Tackeun Kim, Edward Choi
Proceedings of the Conference on Health, Inference, and Learning, PMLR 174:261-281, 2022.

Abstract

As the volume of Electronic Health Records (EHR) sharply grows, there has been emerging interest in learning the representation of EHR for healthcare applications. Representation learning of EHR requires appropriate modeling of the two dominant modalities in EHR: structured data and unstructured text. In this paper, we present MedGTX, a pre-trained model for multi-modal representation learning of the structured and textual EHR data. MedGTX uses a novel graph encoder to exploit the graphical nature of structured EHR data, and a text encoder to handle unstructured text, and a cross-modal encoder to learn a joint representation space. We pre-train our model through four proxy tasks on MIMIC-III, an open-source EHR data, and evaluate our model on two clinical benchmarks and three novel downstream tasks which tackle real-world problems in EHR data. The results consistently show the effectiveness of pre-training the model for joint representation of both structured and unstructured information from EHR. Given the promising performance of MedGTX, we believe this work opens a new door to jointly understanding the two fundamental modalities of EHR data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v174-park22a, title = {Graph-Text Multi-Modal Pre-training for Medical Representation Learning}, author = {Park, Sungjin and Bae, Seongsu and Kim, Jiho and Kim, Tackeun and Choi, Edward}, booktitle = {Proceedings of the Conference on Health, Inference, and Learning}, pages = {261--281}, year = {2022}, editor = {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan}, volume = {174}, series = {Proceedings of Machine Learning Research}, month = {07--08 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v174/park22a/park22a.pdf}, url = {https://proceedings.mlr.press/v174/park22a.html}, abstract = {As the volume of Electronic Health Records (EHR) sharply grows, there has been emerging interest in learning the representation of EHR for healthcare applications. Representation learning of EHR requires appropriate modeling of the two dominant modalities in EHR: structured data and unstructured text. In this paper, we present MedGTX, a pre-trained model for multi-modal representation learning of the structured and textual EHR data. MedGTX uses a novel graph encoder to exploit the graphical nature of structured EHR data, and a text encoder to handle unstructured text, and a cross-modal encoder to learn a joint representation space. We pre-train our model through four proxy tasks on MIMIC-III, an open-source EHR data, and evaluate our model on two clinical benchmarks and three novel downstream tasks which tackle real-world problems in EHR data. The results consistently show the effectiveness of pre-training the model for joint representation of both structured and unstructured information from EHR. Given the promising performance of MedGTX, we believe this work opens a new door to jointly understanding the two fundamental modalities of EHR data.} }
Endnote
%0 Conference Paper %T Graph-Text Multi-Modal Pre-training for Medical Representation Learning %A Sungjin Park %A Seongsu Bae %A Jiho Kim %A Tackeun Kim %A Edward Choi %B Proceedings of the Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2022 %E Gerardo Flores %E George H Chen %E Tom Pollard %E Joyce C Ho %E Tristan Naumann %F pmlr-v174-park22a %I PMLR %P 261--281 %U https://proceedings.mlr.press/v174/park22a.html %V 174 %X As the volume of Electronic Health Records (EHR) sharply grows, there has been emerging interest in learning the representation of EHR for healthcare applications. Representation learning of EHR requires appropriate modeling of the two dominant modalities in EHR: structured data and unstructured text. In this paper, we present MedGTX, a pre-trained model for multi-modal representation learning of the structured and textual EHR data. MedGTX uses a novel graph encoder to exploit the graphical nature of structured EHR data, and a text encoder to handle unstructured text, and a cross-modal encoder to learn a joint representation space. We pre-train our model through four proxy tasks on MIMIC-III, an open-source EHR data, and evaluate our model on two clinical benchmarks and three novel downstream tasks which tackle real-world problems in EHR data. The results consistently show the effectiveness of pre-training the model for joint representation of both structured and unstructured information from EHR. Given the promising performance of MedGTX, we believe this work opens a new door to jointly understanding the two fundamental modalities of EHR data.
APA
Park, S., Bae, S., Kim, J., Kim, T. & Choi, E.. (2022). Graph-Text Multi-Modal Pre-training for Medical Representation Learning. Proceedings of the Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 174:261-281 Available from https://proceedings.mlr.press/v174/park22a.html.

Related Material