SurvivalGAN: Generating Time-to-Event Data for Survival Analysis

Alexander Norcliffe, Bogdan Cebere, Fergus Imrie, Pietro Lió, Mihaela van der Schaar
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10279-10304, 2023.

Abstract

Synthetic data is becoming an increasingly promising technology, and successful applications can improve privacy, fairness, and data democratization. While there are many methods for generating synthetic tabular data, the task remains non-trivial and unexplored for specific scenarios. One such scenario is survival data. Here, the key difficulty is censoring: for some instances, we are not aware of the time of event, or if one even occurred. Imbalances in censoring and time horizons cause generative models to experience three new failure modes specific to survival analysis: (1) generating too few at-risk members; (2) generating too many at-risk members; and (3) censoring too early. We formalize these failure modes and provide three new generative metrics to quantify them. Following this, we propose SurvivalGAN, a generative model that handles survival data firstly by addressing the imbalance in the censoring and event horizons, and secondly by using a dedicated mechanism for approximating time-to-event/censoring. We evaluate this method via extensive experiments on medical datasets. SurvivalGAN outperforms multiple baselines at generating survival data, and in particular addresses the failure modes as measured by the new metrics, in addition to improving downstream performance of survival models trained on the synthetic data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-norcliffe23a, title = {SurvivalGAN: Generating Time-to-Event Data for Survival Analysis}, author = {Norcliffe, Alexander and Cebere, Bogdan and Imrie, Fergus and Li\'o, Pietro and van der Schaar, Mihaela}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {10279--10304}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/norcliffe23a/norcliffe23a.pdf}, url = {https://proceedings.mlr.press/v206/norcliffe23a.html}, abstract = {Synthetic data is becoming an increasingly promising technology, and successful applications can improve privacy, fairness, and data democratization. While there are many methods for generating synthetic tabular data, the task remains non-trivial and unexplored for specific scenarios. One such scenario is survival data. Here, the key difficulty is censoring: for some instances, we are not aware of the time of event, or if one even occurred. Imbalances in censoring and time horizons cause generative models to experience three new failure modes specific to survival analysis: (1) generating too few at-risk members; (2) generating too many at-risk members; and (3) censoring too early. We formalize these failure modes and provide three new generative metrics to quantify them. Following this, we propose SurvivalGAN, a generative model that handles survival data firstly by addressing the imbalance in the censoring and event horizons, and secondly by using a dedicated mechanism for approximating time-to-event/censoring. We evaluate this method via extensive experiments on medical datasets. SurvivalGAN outperforms multiple baselines at generating survival data, and in particular addresses the failure modes as measured by the new metrics, in addition to improving downstream performance of survival models trained on the synthetic data.} }
Endnote
%0 Conference Paper %T SurvivalGAN: Generating Time-to-Event Data for Survival Analysis %A Alexander Norcliffe %A Bogdan Cebere %A Fergus Imrie %A Pietro Lió %A Mihaela van der Schaar %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-norcliffe23a %I PMLR %P 10279--10304 %U https://proceedings.mlr.press/v206/norcliffe23a.html %V 206 %X Synthetic data is becoming an increasingly promising technology, and successful applications can improve privacy, fairness, and data democratization. While there are many methods for generating synthetic tabular data, the task remains non-trivial and unexplored for specific scenarios. One such scenario is survival data. Here, the key difficulty is censoring: for some instances, we are not aware of the time of event, or if one even occurred. Imbalances in censoring and time horizons cause generative models to experience three new failure modes specific to survival analysis: (1) generating too few at-risk members; (2) generating too many at-risk members; and (3) censoring too early. We formalize these failure modes and provide three new generative metrics to quantify them. Following this, we propose SurvivalGAN, a generative model that handles survival data firstly by addressing the imbalance in the censoring and event horizons, and secondly by using a dedicated mechanism for approximating time-to-event/censoring. We evaluate this method via extensive experiments on medical datasets. SurvivalGAN outperforms multiple baselines at generating survival data, and in particular addresses the failure modes as measured by the new metrics, in addition to improving downstream performance of survival models trained on the synthetic data.
APA
Norcliffe, A., Cebere, B., Imrie, F., Lió, P. & van der Schaar, M.. (2023). SurvivalGAN: Generating Time-to-Event Data for Survival Analysis. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:10279-10304 Available from https://proceedings.mlr.press/v206/norcliffe23a.html.

Related Material