Improving Sepsis Prediction Model Generalization With Optimal Transport

Jie Wang, Ronald Moore, Yao Xie, Rishikesan Kamaleswaran
Proceedings of the 2nd Machine Learning for Health symposium, PMLR 193:474-488, 2022.

Abstract

Sepsis is a deadly condition affecting many patients in the hospital. There have been many efforts to build models that predict the onset of sepsis, but these models tend to perform terribly when validated on external data from different hospitals due to distributional shifts in the data and insufficient samples from sepsis patients. To circumvent the curse from noisy and unbalanced samples, we develop a novel two-step approach for sepsis prediction: given feature-label points from the source domain and feature points from the target domain, to obtain a sepsis predictor that has satisfactory performance at the target domain. The proposed algorithm first learns how to transform sample points from the source domain to the target domain, and then applies the distributionally robust optimization (DRO) technique with the Sinkhorn distance and asymmetric cost function to reliably obtain a classifier with satisfactory out-of-sample performance. Connections between our proposed formulation and widely used classification models, i.e., DRO formulation with the Wasserstein distance and regularized logistic regression formulation, are also uncovered. Numerical experiments with synthetic and real datasets demonstrate the competitive performance of the proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v193-wang22a, title = {Improving Sepsis Prediction Model Generalization With Optimal Transport}, author = {Wang, Jie and Moore, Ronald and Xie, Yao and Kamaleswaran, Rishikesan}, booktitle = {Proceedings of the 2nd Machine Learning for Health symposium}, pages = {474--488}, year = {2022}, editor = {Parziale, Antonio and Agrawal, Monica and Joshi, Shalmali and Chen, Irene Y. and Tang, Shengpu and Oala, Luis and Subbaswamy, Adarsh}, volume = {193}, series = {Proceedings of Machine Learning Research}, month = {28 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v193/wang22a/wang22a.pdf}, url = {https://proceedings.mlr.press/v193/wang22a.html}, abstract = {Sepsis is a deadly condition affecting many patients in the hospital. There have been many efforts to build models that predict the onset of sepsis, but these models tend to perform terribly when validated on external data from different hospitals due to distributional shifts in the data and insufficient samples from sepsis patients. To circumvent the curse from noisy and unbalanced samples, we develop a novel two-step approach for sepsis prediction: given feature-label points from the source domain and feature points from the target domain, to obtain a sepsis predictor that has satisfactory performance at the target domain. The proposed algorithm first learns how to transform sample points from the source domain to the target domain, and then applies the distributionally robust optimization (DRO) technique with the Sinkhorn distance and asymmetric cost function to reliably obtain a classifier with satisfactory out-of-sample performance. Connections between our proposed formulation and widely used classification models, i.e., DRO formulation with the Wasserstein distance and regularized logistic regression formulation, are also uncovered. Numerical experiments with synthetic and real datasets demonstrate the competitive performance of the proposed method.} }
Endnote
%0 Conference Paper %T Improving Sepsis Prediction Model Generalization With Optimal Transport %A Jie Wang %A Ronald Moore %A Yao Xie %A Rishikesan Kamaleswaran %B Proceedings of the 2nd Machine Learning for Health symposium %C Proceedings of Machine Learning Research %D 2022 %E Antonio Parziale %E Monica Agrawal %E Shalmali Joshi %E Irene Y. Chen %E Shengpu Tang %E Luis Oala %E Adarsh Subbaswamy %F pmlr-v193-wang22a %I PMLR %P 474--488 %U https://proceedings.mlr.press/v193/wang22a.html %V 193 %X Sepsis is a deadly condition affecting many patients in the hospital. There have been many efforts to build models that predict the onset of sepsis, but these models tend to perform terribly when validated on external data from different hospitals due to distributional shifts in the data and insufficient samples from sepsis patients. To circumvent the curse from noisy and unbalanced samples, we develop a novel two-step approach for sepsis prediction: given feature-label points from the source domain and feature points from the target domain, to obtain a sepsis predictor that has satisfactory performance at the target domain. The proposed algorithm first learns how to transform sample points from the source domain to the target domain, and then applies the distributionally robust optimization (DRO) technique with the Sinkhorn distance and asymmetric cost function to reliably obtain a classifier with satisfactory out-of-sample performance. Connections between our proposed formulation and widely used classification models, i.e., DRO formulation with the Wasserstein distance and regularized logistic regression formulation, are also uncovered. Numerical experiments with synthetic and real datasets demonstrate the competitive performance of the proposed method.
APA
Wang, J., Moore, R., Xie, Y. & Kamaleswaran, R.. (2022). Improving Sepsis Prediction Model Generalization With Optimal Transport. Proceedings of the 2nd Machine Learning for Health symposium, in Proceedings of Machine Learning Research 193:474-488 Available from https://proceedings.mlr.press/v193/wang22a.html.

Related Material