Self-Supervised Probability Imputation to Estimate the External-Natural Cause of Injury Matrix

Pirouz Naghavi, Erica Naghavi, Gang Wang, Kanyin Liane Ong
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:747-774, 2025.

Abstract

The burden of injuries is essential to public health planning and policy making. Public health scientists rely on estimating the probability of nature-of-injuries (NI) for external causes of injuries (ECI) to calculate metrics used to describe burden of injuries globally. With more than 30 million records collected from 15 countries that include ECI with NI, in this study we develop a novel method to estimate probability of NI for ECI using self-supervised matrix imputation. We formulate learning the probability of NI for ECI for our data as a matrix imputation from noisy labels problem. Subsequently, we benchmark the collected data on 16 existing matrix imputation methods to uncover the best performing method for our data. Using self-supervision and data augmentation to curb the model’s tendency to overfit to noisy labels, our matrix imputation approach improves test set RMSE by 7.36% compared to the best performing imputation model used for benchmarking. In addition, the proposed self-supervised approach reduces the Euclidean distance of NI probabilities among age groups with similar probabilities by up to 20% without impacting model performance and uses counterfactual data augmentation (CDA) to mitigate potential biases from age, sex, platform, and country income status.

Cite this Paper


BibTeX
@InProceedings{pmlr-v259-naghavi25a, title = {Self-Supervised Probability Imputation to Estimate the External-Natural Cause of Injury Matrix}, author = {Naghavi, Pirouz and Naghavi, Erica and Wang, Gang and Ong, Kanyin Liane}, booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, pages = {747--774}, year = {2025}, editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, volume = {259}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v259/main/assets/naghavi25a/naghavi25a.pdf}, url = {https://proceedings.mlr.press/v259/naghavi25a.html}, abstract = {The burden of injuries is essential to public health planning and policy making. Public health scientists rely on estimating the probability of nature-of-injuries (NI) for external causes of injuries (ECI) to calculate metrics used to describe burden of injuries globally. With more than 30 million records collected from 15 countries that include ECI with NI, in this study we develop a novel method to estimate probability of NI for ECI using self-supervised matrix imputation. We formulate learning the probability of NI for ECI for our data as a matrix imputation from noisy labels problem. Subsequently, we benchmark the collected data on 16 existing matrix imputation methods to uncover the best performing method for our data. Using self-supervision and data augmentation to curb the model’s tendency to overfit to noisy labels, our matrix imputation approach improves test set RMSE by 7.36% compared to the best performing imputation model used for benchmarking. In addition, the proposed self-supervised approach reduces the Euclidean distance of NI probabilities among age groups with similar probabilities by up to 20% without impacting model performance and uses counterfactual data augmentation (CDA) to mitigate potential biases from age, sex, platform, and country income status.} }
Endnote
%0 Conference Paper %T Self-Supervised Probability Imputation to Estimate the External-Natural Cause of Injury Matrix %A Pirouz Naghavi %A Erica Naghavi %A Gang Wang %A Kanyin Liane Ong %B Proceedings of the 4th Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2025 %E Stefan Hegselmann %E Helen Zhou %E Elizabeth Healey %E Trenton Chang %E Caleb Ellington %E Vishwali Mhasawade %E Sana Tonekaboni %E Peniel Argaw %E Haoran Zhang %F pmlr-v259-naghavi25a %I PMLR %P 747--774 %U https://proceedings.mlr.press/v259/naghavi25a.html %V 259 %X The burden of injuries is essential to public health planning and policy making. Public health scientists rely on estimating the probability of nature-of-injuries (NI) for external causes of injuries (ECI) to calculate metrics used to describe burden of injuries globally. With more than 30 million records collected from 15 countries that include ECI with NI, in this study we develop a novel method to estimate probability of NI for ECI using self-supervised matrix imputation. We formulate learning the probability of NI for ECI for our data as a matrix imputation from noisy labels problem. Subsequently, we benchmark the collected data on 16 existing matrix imputation methods to uncover the best performing method for our data. Using self-supervision and data augmentation to curb the model’s tendency to overfit to noisy labels, our matrix imputation approach improves test set RMSE by 7.36% compared to the best performing imputation model used for benchmarking. In addition, the proposed self-supervised approach reduces the Euclidean distance of NI probabilities among age groups with similar probabilities by up to 20% without impacting model performance and uses counterfactual data augmentation (CDA) to mitigate potential biases from age, sex, platform, and country income status.
APA
Naghavi, P., Naghavi, E., Wang, G. & Ong, K.L.. (2025). Self-Supervised Probability Imputation to Estimate the External-Natural Cause of Injury Matrix. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:747-774 Available from https://proceedings.mlr.press/v259/naghavi25a.html.

Related Material