Proceedings of Machine Learning ResearchProceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021
Held in Stanford University, Palo Alto (CA), USA on 22-24 March 2021
Published as Volume 146 by the Proceedings of Machine Learning Research on 11 May 2021.
Volume Edited by:
Russell Greiner
Neeraj Kumar
Thomas Alexander Gerds
Mihaela van der Schaar
Series Editors:
Neil D. Lawrence
http://proceedings.mlr.press/v146/
Fri, 14 May 2021 17:49:12 +0000Fri, 14 May 2021 17:49:12 +0000Jekyll v3.9.0Survival Trees for Current Status DataCurrent status data arise when the exact time of an event of interest is not known and the only available information about the time is whether the time is beyond a single assessment. When interest lies in prediction based on such data, we define observed data loss functions through censoring unbiased transformations and pseudo-observations to construct unbiased estimates of complete data loss functions, and we use these to fit regression trees and make predictions using current status data. The trees grown based on these methods are found have good properties empirically in terms of recovery of the true tree structure and event time prediction.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/yang21a.html
http://proceedings.mlr.press/v146/yang21a.htmlWavelet Reconstruction Networks for Marked Point ProcessesTimestamped sequences of events, pervasive in domains with data logs, e.g., health records, are often modeled as point processes or rate functions over time. Leading classical methods for risk scores such as Cox and Hawkes processes use such data but make strong assumptions about the shape and form of multivariate influences, resulting in time-to-event distributions irreflective of many real world processes. Methods in point processes and recurrent neural networks capably model rate functions but their complexity may make interpretation, use and reuse challenging. Our work develops a high-performing and interrogable yet simple model. We introduce wavelet reconstruction networks, a multivariate point process with a sparse wavelet reconstruction kernel to model rate functions from marked, timestamped data. We show these simple models achieve improved performance when applied to forecasting complications and care visits in patients with diabetes.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/weiss21a.html
http://proceedings.mlr.press/v146/weiss21a.htmlKullback-Leibler-Based Discrete Relative Risk Models for Integration of Published Prediction Models with New DatasetExisting literature for prediction of time-to-event data has primarily focused on risk factors from a single individual-level dataset. However, these analyses may suffer from small sample sizes, high dimensionality and low signal-to-noise ratios. To improve prediction stability and better understand risk factors associated with outcomes of interest, we propose a Kullback-Leibler-based discrete relative risk modeling procedure to borrow information from existing models. Simulations and real data analysis were conducted to show the advantage of the proposed method compared with those solely based on data from current study or prior information.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/wang21b.html
http://proceedings.mlr.press/v146/wang21b.htmlHarmonic-Mean Cox Models: A Ruler for Equal Attention to RiskSurvival analysis models are necessary for clinical forecasting with data censorship. Implicitly, existing works focus on the individuals with higher risks while lower risk individuals are poorly characterized. Developing survival models to represent different risk individuals equally is a challenging task but of great importance for providing accurate risk assessments across levels of risk. Here, we characterize this problem and propose an adjusted log-likelihood formulation as the new objective for survival prognostication. Several models are then proposed based on the newly designed optimization objective function which produce risks that count individuals “equally” on risk ratios thus providing representative attention to individuals of varying risk. Extensive experiments on multiple real-world datasets demonstrate the benefits of the proposed approach.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/wang21a.html
http://proceedings.mlr.press/v146/wang21a.htmlSurvival Prediction Using Deep LearningIn many biomedical applications, outcome is measured as a “time-to-event” (e.g., time-to-disease progression or death). Cox proportional hazards (CoxPH) model has been widely used to assess the association between baseline characteristics of a patient and this outcome. Meanwhile, in therapeutic areas such as Oncology, clinical imaging (e.g. computerized tomography (CT) scan) is widely used for detection, diagnosis of disease, monitoring of progression and treatment effect. We are interested in using such images with neural network to build predictive models with survival data. However, the standard methodologies cannot be applied to imaging data with time-to-event outcome due to challenges such as memory constraint. In this work, we develop a simple methodology to engage images with survival data. Our proposed methodology is a modified version of CoxPH model that is amenable to SGD and allows us to overcome the existing challenges. We present the neural network architecture for the survival prediction using images. Our architecture can leverage new advances in network topology.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/tarkhan21a.html
http://proceedings.mlr.press/v146/tarkhan21a.htmlExploring the Wasserstein metric for time-to-event analysisSurvival analysis is a type of semi-supervised task where the target output (the survival time) is often right-censored. Utilizing this information is a challenge because it is not obvious how to correctly incorporate these censored examples into a model. We study how three categories of loss functions can take advantage of this information: partial likelihood methods, rank methods, and our own classification method based on a Wasserstein metric (WM) and the non-parametric Kaplan Meier (KM) estimate of the probability density to impute the labels of censored examples. The proposed method predicts the probability distribution of an event, letting us compute survival curves and expected times of survival that are easier to interpret than the rank. We also demonstrate that this approach directly optimizes the expected C-index which is the most common evaluation metric for survival models.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/sylvain21a.html
http://proceedings.mlr.press/v146/sylvain21a.htmlEmpirical Comparison of Continuous and Discrete-time Representations for Survival PredictionSurvival prediction aims to predict the time of occurrence of a particular event of interest, such as the time until a patient dies. The main challenge in survival prediction is the presence of incomplete observations due to censoring. The classical formulation for survival prediction treats the survival time as a continuous outcome, which leads to a censored regression problem. Recent work has reformulated the survival prediction problem by discretizing time into a finite number of bins and then applying multi-task binary classification. While the discrete-time formulation is convenient and potentially requires less assumptions than the continuous-time approach, it also loses information by discretizing time. In this paper, we empirically investigate continuous and discrete-time representations for survival prediction to try to quantify the trade-offs between the two formulations. We find that discretizing time does not necessarily decrease prediction accuracy. Furthermore, discrete-time models can result in even more accurate predictors than continuous-time models, but the number of time bins used for discretization has a significant effect on accuracy and should thus be tuned as a hyperparameter rather than specified for convenience.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/sloma21a.html
http://proceedings.mlr.press/v146/sloma21a.htmlDynamic Survival Analysis with Individualized Truncated Parametric DistributionsDynamic survival analysis is a variant of traditional survival analysis where time-to-event predictions are updated as new information arrives about an individual over time. In this paper we propose a new approach to dynamic survival analysis based on learning a global parametric distribution, followed by individualization via truncating and renormalizing that distribution at different locations over time. We combine this approach with a likelihood-based loss that includes predictions at every time step within an individual’s history, rather than just including one term per individual. The combination of this loss and model results in an interpretable approach to dynamic survival, requiring less fine tuning than existing methods, while still achieving good predictive performance. We evaluate the approach on the problem of predicting hospital mortality for a dataset with over 6900 COVID-19 patients.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/putzel21a.html
http://proceedings.mlr.press/v146/putzel21a.htmlDeep Parametric Time-to-Event Regression with Time-Varying CovariatesTime-to-event regression in healthcare and other domains, such as predictive maintenance, require working with time-series (or time-varying) data such as continuously monitored vital signs, electronic health records, or sensor readings. In such scenarios, the event-time distribution may have temporal dependencies at different time scales that are not easily captured by classical survival models that assume training data points to be independent. In this paper, we describe a fully parametric approach to model censored time-to-event outcomes with time varying covariates. It involves learning representations of the input temporal data using Recurrent Neural Networks such as LSTMs and GRUs, followed by describing the conditional event distribution as a fixed mixture of parametric distributions. The use of the recurrent neural networks allows the learned representations to model long-term dependencies in the input data while jointly estimating the Time-to-Event. We benchmark our approach on MIMIC III: a large, publicly available dataset collected from Intensive Care Unit (ICU) patients, focusing on predicting duration of their ICU stays and their short term life expectancy, and we demonstrate competitive performance of the proposed approach compared to established time-to-event regression models.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/nagpal21a.html
http://proceedings.mlr.press/v146/nagpal21a.htmlTheory and software for boosted nonparametric hazard estimationNonparametric approaches for analyzing survival data in the presence of time-dependent covariates is a timely topic, given the availability of high frequency data capture systems in healthcare and beyond. We present a theoretically justified gradient boosted hazard estimator for this setting, and describe a tree-based implementation called BoXHED (pronounced ‘box-head’) that is available from GitHub:www.github.com/BoXHED. Our numerical study demonstrates that there is a place in the machine learning toolbox for a nonparametric method like BoXHED that can flexibly handle time-dependent covariates. The results presented here are distilled from the recent works of Lee et al. (2021) and Wang et al. (2020).Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/lee21a.html
http://proceedings.mlr.press/v146/lee21a.htmlFinding Relevant Features for Different Times in Survival Prediction by Discrete Hazard Bayesian NetworkWhen predicting the survival time of a patient, different covariates may be important at different times. We introduce a survival prediction model, “discrete hazard Bayesian network", that can provide individual survival curves and also identify which features are relevant for each time interval. This model encodes the discrete hazard function as a sequence of (possibly different) Bayesian networks, one for each time interval. Note each such network includes a “Death” node, which is True iff the person dies in that interval. A set of features relevant for each time interval are the nodes in the Markov blanket around that “Death" node for that interval. We also apply a “discrete hazard computation correction" based on the effective sample size – a correction that avoids biased survival curves. We first show that our model is effective by demonstrating that it can identify the time-varying relevance of the features, using the synthetic dataset. We then provide two real-world examples by analyzing the relevant features for different times on the North Alberta cancer dataset and the Norway/Stanford breast cancer dataset.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/kuan21a.html
http://proceedings.mlr.press/v146/kuan21a.htmlSemi-Structured Deep Piecewise Exponential ModelsWe propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise exponential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data sources and higher-order interaction effects into the model, we embed the model class in a neural network and thereby enable the simultaneous estimation of both inherently interpretable structured regression inputs as well as deep neural network components which can potentially process additional unstructured data sources. A proof of concept is provided by using the framework to predict Alzheimer‘s disease progression based on tabular and 3D point cloud data and applying it to synthetic data.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/kopper21a.html
http://proceedings.mlr.press/v146/kopper21a.htmlDeep-CR MTLR: a Multi-Modal Approach for Cancer Survival Prediction with Competing RisksAccurate survival prediction is crucial for development of precision cancer medicine, creating the need for new sources of prognostic information. Recently, there has been significant interest in exploiting routinely collected clinical and medical imaging data to discover new prognostic markers in multiple cancer types. However, most of the previous studies focus on individual data modalities alone and do not make use of recent advances in machine learning for survival prediction. We present Deep-CR MTLR — a novel machine learning approach for accurate cancer survival prediction from multi-modal clinical and imaging data in the presence of competing risks based on neural networks and an extension of the multi-task logistic regression framework. We demonstrate improved prognostic performance of the multi-modal approach over single modality predictors in a cohort of 2552 head and neck cancer patients, particularly for cancer specific survival, where our approach achieves 2-year AUROC of 0.774 and C-index of 0.788.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/kim21a.html
http://proceedings.mlr.press/v146/kim21a.htmlImproving the Calibration of Long Term Predictions of Heart Failure Rehospitalizations using Medical Concept Embedding‘Medical concept embedding’ aims to provide vector representations of International Statistical Classification of Diseases (ICD) codes such that the relationship between two vectors mirrors the conceptual relationship between the two diagnoses or clinical interventions. Despite the growing interest in vector representations of clinical information in electronic health records (EHR), the utility of embedding methods has not been examined in the context of predicting individualized survival distributions (ISD). In this study, we apply ISD methods, specifically Cox-Proportional Hazards with Kalbfleisch-Prentice extension (CoxPH-KP) and Multi-task Logistic Regression (MTLR), to the task of predicting probability of Heart Failure (HF) rehospitalization or mortality, in a population-level database of 40,568 HF hospitalizations over the span of 8 years. Further, we compare performance of these ISD models with versus without code embeddings, that were learned in a temporally disjoint dataset of 229,359 all-cause hospitalizations. All our models show good discrimination in the validation dataset of 8,114 HF hospitalizations, with time-based concordance greater than 70% for every monthly intervals upto 8 years. Finally, we demonstrate that medical concept embedding does not always lead to improved model discrimination, but does improve model calibration, particularly over the longer time scales.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/kalmady21a.html
http://proceedings.mlr.press/v146/kalmady21a.htmlBeta Survival ModelsSurvival modeling is an important area of study, and has been used widely in many applications including clinical research, online advertising, manufacturing, etc. There are many methods to consider when analyzing survival problems, however these techniques generally focus on either estimating the uncertainty of different risk factors (cox-proportional hazards, etc), or predicting the time to event in a non-parametric way (e.g. tree based methods), or forecasting the survival beyond an observed horizon (parametric techniques such as exponential). In this work, we introduce efficient estimation methods for linear, tree, and neural network versions of the Beta-Logistic model - a classical extension of the logistic function into the discrete survival setting. The Beta-Logistic allows for recovery of the underlying beta distribution as well as having the advantages of non-linear or tree based techniques while still allowing for projecting beyond an observed horizon. Empirical results using simulated data as well as large-scale data-sets across three use-cases (online conversions, retention modeling in a subscription service, and survival of democracies and dictatorships), demonstrate the competitiveness of the method at these tasks. The simplicity of the method and its ability to capture skew in the data makes it a viable alternative to standard techniques particularly when we are interested in forecasting time to event beyond our observed horizon and when the underlying probabilities are heterogeneous.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/hubbard21a.html
http://proceedings.mlr.press/v146/hubbard21a.htmlTransformer-Based Deep Survival AnalysisIn this work, we propose a new Transformer-based survival model which estimates the patient-specific survival distribution. Our contributions are twofold. First, to the best of our knowledge, existing deep survival models use either fully connected or recurrent networks, and we are the first to apply the Transformer in survival analysis. In addition, we use ordinal regression to optimize the survival probabilities over time, and penalize randomized discordant pairs. Second, many survival models are evaluated using only the ranking metrics such as the concordance index. We propose to also use the absolute error metric that evaluates the precise duration predictions on observed subjects. We demonstrate our model on two publicly available real-world datasets, and show that our mean absolute error results are significantly better than the current models, meanwhile, it is challenging to determine the best model under the concordance index.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/hu21a.html
http://proceedings.mlr.press/v146/hu21a.htmlWRSE - a non-parametric weighted-resolution ensemble for predicting individual survival distributions in the ICUDynamic assessment of mortality risk in the intensive care unit (ICU) can be used to stratify patients, inform about treatment effectiveness or serve as part of early-warning systems. Static risk scores, such as APACHE or SAPS, have been supplemented with data-driven approaches that track dynamic mortality risk over time. Recent works have focused on enhancing the information delivered to clinicians even further by producing full survival distributions instead of point predictions or fixed horizon risks. In this work, we propose a non-parametric ensemble model, Weighted Resolution Survival Ensemble (WRSE), tailored to estimate such dynamic individual survival distributions. Inspired by the simplicity and robustness of ensemble methods, the proposed approach combines a set of binary classifiers spaced according to a decay function reflecting the relevance of short-term predictions. Models and baselines are evaluated under weighted calibration and discrimination metrics for individual survival distributions, which closely reflect the utility of a model in ICU practice. We show competitive results with state-of-the-art probabilistic models, while greatly reducing training time by factors of 2-9x.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/heitz21a.html
http://proceedings.mlr.press/v146/heitz21a.htmlThe Safe Logrank Test: Error Control under Optional Stopping, Continuation and Prior MisspecificationWe introduce the safe logrank test, a version of the logrank test that can retain type-I error guarantees under optional stopping and continuation. It allows for effortless combination of data from different trials on different sub-populations while keeping type-I error guarantees and can be extended to define always-valid confidence intervals. Prior knowledge can be accounted for via prior distributions on the hazard ratio in the alternative, but even under ‘bad’ priors Type I error bounds are guaranteed. The test is an instance of the recently developed martingale tests based on e-values. Initial experiments show that the safe logrank test performs well in terms of the maximal and the expected amount of events needed to obtain a desired power.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/grunwald21a.html
http://proceedings.mlr.press/v146/grunwald21a.htmlPreface: AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021Presentation of this volumeTue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/greiner21a.html
http://proceedings.mlr.press/v146/greiner21a.htmlMulti-ethnic Survival Analysis: Transfer Learning with Cox Neural NetworksExtensive collections of personal omics data from large clinical cohorts provide an unprecedented opportunity to develop high-performance machine learning systems for precision medicine. However, most clinical omics data were collected from individuals of European ancestry. Such ancestrally imbalanced data may lead to inaccurate machine learning models for the data-disadvantaged ethnic groups and thus generate new health care disparities. In this work, we develop a transfer learning scheme for survival analysis with multi-ethnic data. We perform machine learning experiments on real and synthetic clinical omics datasets to show that transfer learning can improve the prognostic accuracy of Cox neural network models for data-disadvantaged ethnic groups.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/gao21a.html
http://proceedings.mlr.press/v146/gao21a.htmlIDNetwork: A deep Illness-Death Network based on multi-states event history process for versatile disease prognosticationMulti-state models can capture the different patterns of disease evolution. In particular, the illness-death model is used to follow disease progression from a healthy state to an intermediate state and to a death-related final state. We aim to use those models in order to adapt treatment decisions according to the evolution of the disease. In state-of-the-art methods, the risks of transition are modeled via (semi-) Markov processes and transition-specific Cox proportional hazard (P.H.) models. We propose a neural network architecture called IDNetwork (Illness-Death Network) that relaxes the linear Cox P.H. assumption and integrates a large number of patients’ characteristics. Our method significantly improves the predictive performance compared to state-of-the-art methods on a simulated data set, on two clinical trials for patients with colon cancer and on a real-world data set in breast cancer.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/cottin21a.html
http://proceedings.mlr.press/v146/cottin21a.htmlRisk and Survival Analysis from COVID Outbreak Data: Lessons from IndiaThe present analysis is an attempt to provide data-backed evidence around mortality due to COVID-19 in Indian context. We provide a description of the prevailing COVID-19 conditions in India by means of succinct visualisation via a dynamic dashboard and cluster analysis. Building upon this, we performed survival analysis on COVID-19 patients from the state of Karnataka, stratifying the data on the basis of age and gender. The findings of the same have been reported in this paper. To our knowledge, this is the largest retrospective cohort-based survival analysis in Indian context.Tue, 11 May 2021 00:00:00 +0000
http://proceedings.mlr.press/v146/bankar21a.html
http://proceedings.mlr.press/v146/bankar21a.html