Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting

Michael Staniek; Marius Fracarolli; Michael Hagmann; Stefan Riezler

Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting

Michael Staniek, Marius Fracarolli, Michael Hagmann, Stefan Riezler

Proceedings of the 9th Machine Learning for Healthcare Conference, PMLR 252, 2024.

Abstract

Machine learning for early syndrome diagnosis aims to solve the intricate task of predicting a ground truth label that most often is the outcome (effect) of a medical consensus definition applied to observed clinical measurements (causes), given clinical measurements observed several hours before. Instead of focusing on the prediction of the future effect, we propose to directly predict the causes via time series forecasting (TSF) of clinical variables and determine the effect by applying the gold standard consensus definition to the forecasted values. This method has the invaluable advantage of being straightforwardly interpretable to clinical practitioners, and because model training does not rely on a particular label anymore, the forecasted data can be used to predict any consensus-based label. We exemplify our method by means of long-term TSF with Transformer models, with a focus on accurate prediction of sparse clinical variables involved in the SOFA-based Sepsis-3 definition and the new Simplified Acute Physiology Score (SAPS-II) definition. Our experiments are conducted on two datasets and show that contrary to recent proposals which advocate set function encoders for time series and direct multi-step decoders, best results are achieved by a combination of standard dense encoders with iterative multi-step decoders. The key for success of iterative multi-step decoding can be attributed to its ability to capture cross-variate dependencies and to a student forcing training strategy that teaches the model to rely on its own previous time step predictions for the next time step prediction.

Cite this Paper

BibTeX

@InProceedings{pmlr-v252-staniek24a,
  title = 	 {Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting},
  author =       {Staniek, Michael and Fracarolli, Marius and Hagmann, Michael and Riezler, Stefan},
  booktitle = 	 {Proceedings of the 9th Machine Learning for Healthcare Conference},
  year = 	 {2024},
  editor = 	 {Deshpande, Kaivalya and Fiterau, Madalina and Joshi, Shalmali and Lipton, Zachary and Ranganath, Rajesh and Urteaga, Iñigo},
  volume = 	 {252},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--17 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v252/main/assets/staniek24a/staniek24a.pdf},
  url = 	 {https://proceedings.mlr.press/v252/staniek24a.html},
  abstract = 	 {Machine learning for early syndrome diagnosis aims to solve the intricate task of predicting a ground truth label that most often is the outcome (effect) of a medical consensus definition applied to observed clinical measurements (causes), given clinical measurements observed several hours before. Instead of focusing on the prediction of the future effect, we propose to directly predict the causes via time series forecasting (TSF) of clinical variables and determine the effect by applying the gold standard consensus definition to the forecasted values.  This method has the invaluable advantage of being straightforwardly interpretable to clinical practitioners, and because model training does not rely on a particular label anymore, the forecasted data can be used to predict any consensus-based label.  We exemplify our method by means of long-term TSF with Transformer models, with a focus on accurate prediction of sparse clinical variables involved in the SOFA-based Sepsis-3 definition and the new Simplified Acute Physiology Score (SAPS-II) definition.  Our experiments are conducted on two datasets and show that contrary to recent proposals which advocate set function encoders for time series and direct multi-step decoders, best results are achieved by a combination of standard dense encoders with iterative multi-step decoders.  The key for success of iterative multi-step decoding can be attributed to its ability to capture cross-variate dependencies and to a student forcing training strategy that teaches the model to rely on its own previous time step predictions for the next time step prediction.}
}

Endnote

%0 Conference Paper
%T Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting
%A Michael Staniek
%A Marius Fracarolli
%A Michael Hagmann
%A Stefan Riezler
%B Proceedings of the 9th Machine Learning for Healthcare Conference
%C Proceedings of Machine Learning Research
%D 2024
%E Kaivalya Deshpande
%E Madalina Fiterau
%E Shalmali Joshi
%E Zachary Lipton
%E Rajesh Ranganath
%E Iñigo Urteaga	
%F pmlr-v252-staniek24a
%I PMLR
%U https://proceedings.mlr.press/v252/staniek24a.html
%V 252
%X Machine learning for early syndrome diagnosis aims to solve the intricate task of predicting a ground truth label that most often is the outcome (effect) of a medical consensus definition applied to observed clinical measurements (causes), given clinical measurements observed several hours before. Instead of focusing on the prediction of the future effect, we propose to directly predict the causes via time series forecasting (TSF) of clinical variables and determine the effect by applying the gold standard consensus definition to the forecasted values.  This method has the invaluable advantage of being straightforwardly interpretable to clinical practitioners, and because model training does not rely on a particular label anymore, the forecasted data can be used to predict any consensus-based label.  We exemplify our method by means of long-term TSF with Transformer models, with a focus on accurate prediction of sparse clinical variables involved in the SOFA-based Sepsis-3 definition and the new Simplified Acute Physiology Score (SAPS-II) definition.  Our experiments are conducted on two datasets and show that contrary to recent proposals which advocate set function encoders for time series and direct multi-step decoders, best results are achieved by a combination of standard dense encoders with iterative multi-step decoders.  The key for success of iterative multi-step decoding can be attributed to its ability to capture cross-variate dependencies and to a student forcing training strategy that teaches the model to rely on its own previous time step predictions for the next time step prediction.

APA

Staniek, M., Fracarolli, M., Hagmann, M. & Riezler, S.. (2024). Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting. Proceedings of the 9th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 252 Available from https://proceedings.mlr.press/v252/staniek24a.html.

Related Material

Download PDF