Finding Relevant Features for Different Times in Survival Prediction by Discrete Hazard Bayesian Network

Li-Hao Kuan, Russell Greiner
Proceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021, PMLR 146:240-251, 2021.

Abstract

When predicting the survival time of a patient, different covariates may be important at different times. We introduce a survival prediction model, “discrete hazard Bayesian network", that can provide individual survival curves and also identify which features are relevant for each time interval. This model encodes the discrete hazard function as a sequence of (possibly different) Bayesian networks, one for each time interval. Note each such network includes a “Death” node, which is True iff the person dies in that interval. A set of features relevant for each time interval are the nodes in the Markov blanket around that “Death" node for that interval. We also apply a “discrete hazard computation correction" based on the effective sample size – a correction that avoids biased survival curves. We first show that our model is effective by demonstrating that it can identify the time-varying relevance of the features, using the synthetic dataset. We then provide two real-world examples by analyzing the relevant features for different times on the North Alberta cancer dataset and the Norway/Stanford breast cancer dataset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v146-kuan21a, title = {Finding Relevant Features for Different Times in Survival Prediction by Discrete Hazard Bayesian Network}, author = {Kuan, Li-Hao and Greiner, Russell}, booktitle = {Proceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021}, pages = {240--251}, year = {2021}, editor = {Greiner, Russell and Kumar, Neeraj and Gerds, Thomas Alexander and van der Schaar, Mihaela}, volume = {146}, series = {Proceedings of Machine Learning Research}, month = {22--24 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v146/kuan21a/kuan21a.pdf}, url = {https://proceedings.mlr.press/v146/kuan21a.html}, abstract = {When predicting the survival time of a patient, different covariates may be important at different times. We introduce a survival prediction model, “discrete hazard Bayesian network", that can provide individual survival curves and also identify which features are relevant for each time interval. This model encodes the discrete hazard function as a sequence of (possibly different) Bayesian networks, one for each time interval. Note each such network includes a “Death” node, which is True iff the person dies in that interval. A set of features relevant for each time interval are the nodes in the Markov blanket around that “Death" node for that interval. We also apply a “discrete hazard computation correction" based on the effective sample size – a correction that avoids biased survival curves. We first show that our model is effective by demonstrating that it can identify the time-varying relevance of the features, using the synthetic dataset. We then provide two real-world examples by analyzing the relevant features for different times on the North Alberta cancer dataset and the Norway/Stanford breast cancer dataset.} }
Endnote
%0 Conference Paper %T Finding Relevant Features for Different Times in Survival Prediction by Discrete Hazard Bayesian Network %A Li-Hao Kuan %A Russell Greiner %B Proceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021 %C Proceedings of Machine Learning Research %D 2021 %E Russell Greiner %E Neeraj Kumar %E Thomas Alexander Gerds %E Mihaela van der Schaar %F pmlr-v146-kuan21a %I PMLR %P 240--251 %U https://proceedings.mlr.press/v146/kuan21a.html %V 146 %X When predicting the survival time of a patient, different covariates may be important at different times. We introduce a survival prediction model, “discrete hazard Bayesian network", that can provide individual survival curves and also identify which features are relevant for each time interval. This model encodes the discrete hazard function as a sequence of (possibly different) Bayesian networks, one for each time interval. Note each such network includes a “Death” node, which is True iff the person dies in that interval. A set of features relevant for each time interval are the nodes in the Markov blanket around that “Death" node for that interval. We also apply a “discrete hazard computation correction" based on the effective sample size – a correction that avoids biased survival curves. We first show that our model is effective by demonstrating that it can identify the time-varying relevance of the features, using the synthetic dataset. We then provide two real-world examples by analyzing the relevant features for different times on the North Alberta cancer dataset and the Norway/Stanford breast cancer dataset.
APA
Kuan, L. & Greiner, R.. (2021). Finding Relevant Features for Different Times in Survival Prediction by Discrete Hazard Bayesian Network. Proceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021, in Proceedings of Machine Learning Research 146:240-251 Available from https://proceedings.mlr.press/v146/kuan21a.html.

Related Material