Exploiting structured data for learning contagious diseases under incomplete testing

Maggie Makar, Lauren West, David Hooper, Eric Horvitz, Erica Shenoy, John Guttag
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7348-7357, 2021.

Abstract

One of the ways that machine learning algorithms can help control the spread of an infectious disease is by building models that predict who is likely to become infected making them good candidates for preemptive interventions. In this work we ask: can we build reliable infection prediction models when the observed data is collected under limited, and biased testing that prioritizes testing symptomatic individuals? Our analysis suggests that when the infection is highly transmissible, incomplete testing might be sufficient to achieve good out-of-sample prediction error. Guided by this insight, we develop an algorithm that predicts infections, and show that it outperforms baselines on simulated data. We apply our model to data from a large hospital to predict Clostridioides difficile infections; a communicable disease that is characterized by both symptomatically infected and asymptomatic (i.e., untested) carriers. Using a proxy instead of the unobserved untested-infected state, we show that our model outperforms benchmarks in predicting infections.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-makar21a, title = {Exploiting structured data for learning contagious diseases under incomplete testing}, author = {Makar, Maggie and West, Lauren and Hooper, David and Horvitz, Eric and Shenoy, Erica and Guttag, John}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7348--7357}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/makar21a/makar21a.pdf}, url = {https://proceedings.mlr.press/v139/makar21a.html}, abstract = {One of the ways that machine learning algorithms can help control the spread of an infectious disease is by building models that predict who is likely to become infected making them good candidates for preemptive interventions. In this work we ask: can we build reliable infection prediction models when the observed data is collected under limited, and biased testing that prioritizes testing symptomatic individuals? Our analysis suggests that when the infection is highly transmissible, incomplete testing might be sufficient to achieve good out-of-sample prediction error. Guided by this insight, we develop an algorithm that predicts infections, and show that it outperforms baselines on simulated data. We apply our model to data from a large hospital to predict Clostridioides difficile infections; a communicable disease that is characterized by both symptomatically infected and asymptomatic (i.e., untested) carriers. Using a proxy instead of the unobserved untested-infected state, we show that our model outperforms benchmarks in predicting infections.} }
Endnote
%0 Conference Paper %T Exploiting structured data for learning contagious diseases under incomplete testing %A Maggie Makar %A Lauren West %A David Hooper %A Eric Horvitz %A Erica Shenoy %A John Guttag %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-makar21a %I PMLR %P 7348--7357 %U https://proceedings.mlr.press/v139/makar21a.html %V 139 %X One of the ways that machine learning algorithms can help control the spread of an infectious disease is by building models that predict who is likely to become infected making them good candidates for preemptive interventions. In this work we ask: can we build reliable infection prediction models when the observed data is collected under limited, and biased testing that prioritizes testing symptomatic individuals? Our analysis suggests that when the infection is highly transmissible, incomplete testing might be sufficient to achieve good out-of-sample prediction error. Guided by this insight, we develop an algorithm that predicts infections, and show that it outperforms baselines on simulated data. We apply our model to data from a large hospital to predict Clostridioides difficile infections; a communicable disease that is characterized by both symptomatically infected and asymptomatic (i.e., untested) carriers. Using a proxy instead of the unobserved untested-infected state, we show that our model outperforms benchmarks in predicting infections.
APA
Makar, M., West, L., Hooper, D., Horvitz, E., Shenoy, E. & Guttag, J.. (2021). Exploiting structured data for learning contagious diseases under incomplete testing. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7348-7357 Available from https://proceedings.mlr.press/v139/makar21a.html.

Related Material