Improving Sepsis Prediction Model Generalization With Optimal Transport
Proceedings of the 2nd Machine Learning for Health symposium, PMLR 193:474-488, 2022.
Sepsis is a deadly condition affecting many patients in the hospital. There have been many efforts to build models that predict the onset of sepsis, but these models tend to perform terribly when validated on external data from different hospitals due to distributional shifts in the data and insufficient samples from sepsis patients. To circumvent the curse from noisy and unbalanced samples, we develop a novel two-step approach for sepsis prediction: given feature-label points from the source domain and feature points from the target domain, to obtain a sepsis predictor that has satisfactory performance at the target domain. The proposed algorithm first learns how to transform sample points from the source domain to the target domain, and then applies the distributionally robust optimization (DRO) technique with the Sinkhorn distance and asymmetric cost function to reliably obtain a classifier with satisfactory out-of-sample performance. Connections between our proposed formulation and widely used classification models, i.e., DRO formulation with the Wasserstein distance and regularized logistic regression formulation, are also uncovered. Numerical experiments with synthetic and real datasets demonstrate the competitive performance of the proposed method.