Learning to Summarize Electronic Health Records Using Cross-Modality Correspondences
Proceedings of the 3rd Machine Learning for Healthcare Conference, PMLR 85:551-570, 2018.
Electronic Health Records (EHRs) contain an overwhelming amount of information about each patient, making it difficult for clinicians to quickly find the most salient information. Accurate, concise summarization of relevant data can help alleviate this cognitive burden. In practice, clinical narrative notes serve this purpose during the course of care, but they are only intermittently updated and are sometimes missing information. We address this problem by learning to generate topics that should be in summaries of structured health record data at any point during a stay. We use the detailed, high-dimensional structured data to predict existing clinical note topics. Our model can generate topics based on structured health record data, even when a real note does not exist. We demonstrate that using structured data alone, we are able to generate note topics comparable to the performance of using prior notes alone. Our method is also capable of generating the first note in the stay. We demonstrate that our predicted topic distributions are meaningful using the downstream task of predicting in-hospital mortality. We show that our generated note topic vectors perform comparably or even outperform topics from the actual notes on predicting in-hospital mortality.