Improving the Calibration of Long Term Predictions of Heart Failure Rehospitalizations using Medical Concept Embedding
Proceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021, PMLR 146:70-82, 2021.
‘Medical concept embedding’ aims to provide vector representations of International Statistical Classification of Diseases (ICD) codes such that the relationship between two vectors mirrors the conceptual relationship between the two diagnoses or clinical interventions. Despite the growing interest in vector representations of clinical information in electronic health records (EHR), the utility of embedding methods has not been examined in the context of predicting individualized survival distributions (ISD). In this study, we apply ISD methods, specifically Cox-Proportional Hazards with Kalbfleisch-Prentice extension (CoxPH-KP) and Multi-task Logistic Regression (MTLR), to the task of predicting probability of Heart Failure (HF) rehospitalization or mortality, in a population-level database of 40,568 HF hospitalizations over the span of 8 years. Further, we compare performance of these ISD models with versus without code embeddings, that were learned in a temporally disjoint dataset of 229,359 all-cause hospitalizations. All our models show good discrimination in the validation dataset of 8,114 HF hospitalizations, with time-based concordance greater than 70% for every monthly intervals upto 8 years. Finally, we demonstrate that medical concept embedding does not always lead to improved model discrimination, but does improve model calibration, particularly over the longer time scales.