[edit]
Self-Supervised Probability Imputation to Estimate the External-Natural Cause of Injury Matrix
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:747-774, 2025.
Abstract
The burden of injuries is essential to public health planning and policy making. Public health scientists rely on estimating the probability of nature-of-injuries (NI) for external causes of injuries (ECI) to calculate metrics used to describe burden of injuries globally. With more than 30 million records collected from 15 countries that include ECI with NI, in this study we develop a novel method to estimate probability of NI for ECI using self-supervised matrix imputation. We formulate learning the probability of NI for ECI for our data as a matrix imputation from noisy labels problem. Subsequently, we benchmark the collected data on 16 existing matrix imputation methods to uncover the best performing method for our data. Using self-supervision and data augmentation to curb the model’s tendency to overfit to noisy labels, our matrix imputation approach improves test set RMSE by 7.36% compared to the best performing imputation model used for benchmarking. In addition, the proposed self-supervised approach reduces the Euclidean distance of NI probabilities among age groups with similar probabilities by up to 20% without impacting model performance and uses counterfactual data augmentation (CDA) to mitigate potential biases from age, sex, platform, and country income status.