Finding Relevant Features for Different Times in Survival Prediction by Discrete Hazard Bayesian Network
Proceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021, PMLR 146:240-251, 2021.
When predicting the survival time of a patient, different covariates may be important at different times. We introduce a survival prediction model, “discrete hazard Bayesian network", that can provide individual survival curves and also identify which features are relevant for each time interval. This model encodes the discrete hazard function as a sequence of (possibly different) Bayesian networks, one for each time interval. Note each such network includes a “Death” node, which is True iff the person dies in that interval. A set of features relevant for each time interval are the nodes in the Markov blanket around that “Death" node for that interval. We also apply a “discrete hazard computation correction" based on the effective sample size – a correction that avoids biased survival curves. We first show that our model is effective by demonstrating that it can identify the time-varying relevance of the features, using the synthetic dataset. We then provide two real-world examples by analyzing the relevant features for different times on the North Alberta cancer dataset and the Norway/Stanford breast cancer dataset.