Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

Kyunghoon Hur; Jiyoung Lee; Jungwoo Oh; Wesley Price; Younghak Kim; Edward Choi

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

Kyunghoon Hur, Jiyoung Lee, Jungwoo Oh, Wesley Price, Younghak Kim, Edward Choi

Proceedings of the Conference on Health, Inference, and Learning, PMLR 174:183-203, 2022.

Abstract

Increase in the use of Electronic Health Records (EHRs) has facilitated advances in predictive healthcare. However, EHR systems lack a unified code system for representing medical concepts. Heterogeneous formats of EHR present a barrier for the training and deployment of state-of-the-art deep learning models at scale. To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR. DescEmb takes advantage of the flexibility of neural language models while maintaining a neutral approach that can be combined with prior frameworks for task-specific representation learning or predictive modeling. We test our model’s capacity on various experiments including prediction tasks, transfer learning and pooled learning. DescEmb shows higher performance in overall experiments compared to the code-based approach, opening the door to a text-based approach in predictive healthcare research that is not constrained by EHR structure nor special domain knowledge.

Cite this Paper

BibTeX

@InProceedings{pmlr-v174-hur22a,
  title = 	 {Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding},
  author =       {Hur, Kyunghoon and Lee, Jiyoung and Oh, Jungwoo and Price, Wesley and Kim, Younghak and Choi, Edward},
  booktitle = 	 {Proceedings of the Conference on Health, Inference, and Learning},
  pages = 	 {183--203},
  year = 	 {2022},
  editor = 	 {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan},
  volume = 	 {174},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--08 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v174/hur22a/hur22a.pdf},
  url = 	 {https://proceedings.mlr.press/v174/hur22a.html},
  abstract = 	 {Increase in the use of Electronic Health Records (EHRs) has facilitated advances in predictive healthcare. However, EHR systems lack a unified code system for representing medical concepts. Heterogeneous formats of EHR present a barrier for the training and deployment of state-of-the-art deep learning models at scale. To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR. DescEmb takes advantage of the flexibility of neural language models while maintaining a neutral approach that can be combined with prior frameworks for task-specific representation learning or predictive modeling. We test our model’s capacity on various experiments including prediction tasks, transfer learning and pooled learning. DescEmb shows higher performance in overall experiments compared to the code-based approach, opening the door to a text-based approach in predictive healthcare research that is not constrained by EHR structure nor special domain knowledge.}
}

Endnote

%0 Conference Paper
%T Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding
%A Kyunghoon Hur
%A Jiyoung Lee
%A Jungwoo Oh
%A Wesley Price
%A Younghak Kim
%A Edward Choi
%B Proceedings of the Conference on Health, Inference, and Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Gerardo Flores
%E George H Chen
%E Tom Pollard
%E Joyce C Ho
%E Tristan Naumann	
%F pmlr-v174-hur22a
%I PMLR
%P 183--203
%U https://proceedings.mlr.press/v174/hur22a.html
%V 174
%X Increase in the use of Electronic Health Records (EHRs) has facilitated advances in predictive healthcare. However, EHR systems lack a unified code system for representing medical concepts. Heterogeneous formats of EHR present a barrier for the training and deployment of state-of-the-art deep learning models at scale. To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR. DescEmb takes advantage of the flexibility of neural language models while maintaining a neutral approach that can be combined with prior frameworks for task-specific representation learning or predictive modeling. We test our model’s capacity on various experiments including prediction tasks, transfer learning and pooled learning. DescEmb shows higher performance in overall experiments compared to the code-based approach, opening the door to a text-based approach in predictive healthcare research that is not constrained by EHR structure nor special domain knowledge.

APA

Hur, K., Lee, J., Oh, J., Price, W., Kim, Y. & Choi, E.. (2022). Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding. Proceedings of the Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 174:183-203 Available from https://proceedings.mlr.press/v174/hur22a.html.

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

Abstract

Cite this Paper

Related Material