Simulation of Health Time Series with Nonstationarity

Adedolapo Aishat Toye, Louis Gomez, Samantha Kleinberg
Proceedings of the fifth Conference on Health, Inference, and Learning, PMLR 248:217-232, 2024.

Abstract

Limited access to health data remains a challenge for developing machine learning (ML) models. Health data is difficult to share due to privacy concerns and often does not have ground truth. Simulated data is often used for evaluating algorithms, as it can be shared freely and generated with ground truth. However, for simulated data to be used as an alternative to real data, algorithmic performance must be similar to that of real data. Existing simulation approaches are either black boxes or rely solely on expert knowledge, which may be incomplete. These methods generate data that often overstates performance, as they do not simulate many of the properties that make real data challenging. Nonstationarity, where a system’s properties or parameters change over time, is pervasive in health data with changing health status of patients, standards of care, and populations. This makes ML challenging and can lead to reduced model generalizability, yet there have not been ways to systematically simulate realistic nonstationary data. This paper introduces a modular approach for learning dataset-specific models of nonstationarity in real data and augmenting simulated data with these properties to generate realistic synthetic datasets. We show that our simulation approach brings performance closer to that of real data in stress classification and glucose forecasting in people with diabetes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v248-toye24a, title = {Simulation of Health Time Series with Nonstationarity}, author = {Toye, Adedolapo Aishat and Gomez, Louis and Kleinberg, Samantha}, booktitle = {Proceedings of the fifth Conference on Health, Inference, and Learning}, pages = {217--232}, year = {2024}, editor = {Pollard, Tom and Choi, Edward and Singhal, Pankhuri and Hughes, Michael and Sizikova, Elena and Mortazavi, Bobak and Chen, Irene and Wang, Fei and Sarker, Tasmie and McDermott, Matthew and Ghassemi, Marzyeh}, volume = {248}, series = {Proceedings of Machine Learning Research}, month = {27--28 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v248/main/assets/toye24a/toye24a.pdf}, url = {https://proceedings.mlr.press/v248/toye24a.html}, abstract = {Limited access to health data remains a challenge for developing machine learning (ML) models. Health data is difficult to share due to privacy concerns and often does not have ground truth. Simulated data is often used for evaluating algorithms, as it can be shared freely and generated with ground truth. However, for simulated data to be used as an alternative to real data, algorithmic performance must be similar to that of real data. Existing simulation approaches are either black boxes or rely solely on expert knowledge, which may be incomplete. These methods generate data that often overstates performance, as they do not simulate many of the properties that make real data challenging. Nonstationarity, where a system’s properties or parameters change over time, is pervasive in health data with changing health status of patients, standards of care, and populations. This makes ML challenging and can lead to reduced model generalizability, yet there have not been ways to systematically simulate realistic nonstationary data. This paper introduces a modular approach for learning dataset-specific models of nonstationarity in real data and augmenting simulated data with these properties to generate realistic synthetic datasets. We show that our simulation approach brings performance closer to that of real data in stress classification and glucose forecasting in people with diabetes.} }
Endnote
%0 Conference Paper %T Simulation of Health Time Series with Nonstationarity %A Adedolapo Aishat Toye %A Louis Gomez %A Samantha Kleinberg %B Proceedings of the fifth Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2024 %E Tom Pollard %E Edward Choi %E Pankhuri Singhal %E Michael Hughes %E Elena Sizikova %E Bobak Mortazavi %E Irene Chen %E Fei Wang %E Tasmie Sarker %E Matthew McDermott %E Marzyeh Ghassemi %F pmlr-v248-toye24a %I PMLR %P 217--232 %U https://proceedings.mlr.press/v248/toye24a.html %V 248 %X Limited access to health data remains a challenge for developing machine learning (ML) models. Health data is difficult to share due to privacy concerns and often does not have ground truth. Simulated data is often used for evaluating algorithms, as it can be shared freely and generated with ground truth. However, for simulated data to be used as an alternative to real data, algorithmic performance must be similar to that of real data. Existing simulation approaches are either black boxes or rely solely on expert knowledge, which may be incomplete. These methods generate data that often overstates performance, as they do not simulate many of the properties that make real data challenging. Nonstationarity, where a system’s properties or parameters change over time, is pervasive in health data with changing health status of patients, standards of care, and populations. This makes ML challenging and can lead to reduced model generalizability, yet there have not been ways to systematically simulate realistic nonstationary data. This paper introduces a modular approach for learning dataset-specific models of nonstationarity in real data and augmenting simulated data with these properties to generate realistic synthetic datasets. We show that our simulation approach brings performance closer to that of real data in stress classification and glucose forecasting in people with diabetes.
APA
Toye, A.A., Gomez, L. & Kleinberg, S.. (2024). Simulation of Health Time Series with Nonstationarity. Proceedings of the fifth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 248:217-232 Available from https://proceedings.mlr.press/v248/toye24a.html.

Related Material