Rediscovery of CNN’s Versatility for Text-based Encoding of Raw Electronic Health Records

Eunbyeol Cho, Minjae Lee, Kyunghoon Hur, Jiyoun Kim, Jinsung Yoon, Edward Choi
Proceedings of the Conference on Health, Inference, and Learning, PMLR 209:294-313, 2023.

Abstract

Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v209-lee23a, title = {Rediscovery of CNN’s Versatility for Text-based Encoding of Raw Electronic Health Records}, author = {Cho, Eunbyeol and Lee, Minjae and Hur, Kyunghoon and Kim, Jiyoun and Yoon, Jinsung and Choi, Edward}, booktitle = {Proceedings of the Conference on Health, Inference, and Learning}, pages = {294--313}, year = {2023}, editor = {Mortazavi, Bobak J. and Sarker, Tasmie and Beam, Andrew and Ho, Joyce C.}, volume = {209}, series = {Proceedings of Machine Learning Research}, month = {22 Jun--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v209/lee23a/lee23a.pdf}, url = {https://proceedings.mlr.press/v209/lee23a.html}, abstract = {Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.} }
Endnote
%0 Conference Paper %T Rediscovery of CNN’s Versatility for Text-based Encoding of Raw Electronic Health Records %A Eunbyeol Cho %A Minjae Lee %A Kyunghoon Hur %A Jiyoun Kim %A Jinsung Yoon %A Edward Choi %B Proceedings of the Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2023 %E Bobak J. Mortazavi %E Tasmie Sarker %E Andrew Beam %E Joyce C. Ho %F pmlr-v209-lee23a %I PMLR %P 294--313 %U https://proceedings.mlr.press/v209/lee23a.html %V 209 %X Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.
APA
Cho, E., Lee, M., Hur, K., Kim, J., Yoon, J. & Choi, E.. (2023). Rediscovery of CNN’s Versatility for Text-based Encoding of Raw Electronic Health Records. Proceedings of the Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 209:294-313 Available from https://proceedings.mlr.press/v209/lee23a.html.

Related Material