Disease Propagation in Social Networks: A Novel Study of Infection Genesis and Spread on Twitter

Manan Shah
Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, PMLR 53:85-102, 2016.

Abstract

The CDC (Centers for Disease Control and Prevention) currently diagnoses millions of cases of infectious diseases annually, generating population disease distributions that, while accurate, are far too delayed for real-time monitoring. The ability to instantly compile and monitor such distributions is critical in identifying outbreaks and facilitating real-time communication between health authorities and health-care providers. This task, however, is made challenging due to the lack of instantly available public health information, creating a need for the analysis of disease spread on frequently updated social media websites. We introduce a novel pipeline based model to generate a real-time, accurate depiction of infectious disease propagation using Twitter data. Our approach, an amalgam of natural language processing and supervised machine learning, is invariant to mass media hype and significantly reduces the noise introduced by the use of tweets. The correlation coefficient between the Twitter disease distribution obtained via our approach and CDC data from mid-2013 to mid-2014 was 0.983, improving upon the best model published for the 2012-13 flu season. Our model further correlates well with theoretical models of infection spread across airport networks, verifying its robustness and applicability in the public sphere.

Cite this Paper


BibTeX
@InProceedings{pmlr-v53-shah16, title = {Disease Propagation in Social Networks: A Novel Study of Infection Genesis and Spread on Twitter}, author = {Shah, Manan}, booktitle = {Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016}, pages = {85--102}, year = {2016}, editor = {Fan, Wei and Bifet, Albert and Read, Jesse and Yang, Qiang and Yu, Philip S.}, volume = {53}, series = {Proceedings of Machine Learning Research}, address = {San Francisco, California, USA}, month = {14 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v53/shah16.pdf}, url = {https://proceedings.mlr.press/v53/shah16.html}, abstract = {The CDC (Centers for Disease Control and Prevention) currently diagnoses millions of cases of infectious diseases annually, generating population disease distributions that, while accurate, are far too delayed for real-time monitoring. The ability to instantly compile and monitor such distributions is critical in identifying outbreaks and facilitating real-time communication between health authorities and health-care providers. This task, however, is made challenging due to the lack of instantly available public health information, creating a need for the analysis of disease spread on frequently updated social media websites. We introduce a novel pipeline based model to generate a real-time, accurate depiction of infectious disease propagation using Twitter data. Our approach, an amalgam of natural language processing and supervised machine learning, is invariant to mass media hype and significantly reduces the noise introduced by the use of tweets. The correlation coefficient between the Twitter disease distribution obtained via our approach and CDC data from mid-2013 to mid-2014 was 0.983, improving upon the best model published for the 2012-13 flu season. Our model further correlates well with theoretical models of infection spread across airport networks, verifying its robustness and applicability in the public sphere.} }
Endnote
%0 Conference Paper %T Disease Propagation in Social Networks: A Novel Study of Infection Genesis and Spread on Twitter %A Manan Shah %B Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016 %C Proceedings of Machine Learning Research %D 2016 %E Wei Fan %E Albert Bifet %E Jesse Read %E Qiang Yang %E Philip S. Yu %F pmlr-v53-shah16 %I PMLR %P 85--102 %U https://proceedings.mlr.press/v53/shah16.html %V 53 %X The CDC (Centers for Disease Control and Prevention) currently diagnoses millions of cases of infectious diseases annually, generating population disease distributions that, while accurate, are far too delayed for real-time monitoring. The ability to instantly compile and monitor such distributions is critical in identifying outbreaks and facilitating real-time communication between health authorities and health-care providers. This task, however, is made challenging due to the lack of instantly available public health information, creating a need for the analysis of disease spread on frequently updated social media websites. We introduce a novel pipeline based model to generate a real-time, accurate depiction of infectious disease propagation using Twitter data. Our approach, an amalgam of natural language processing and supervised machine learning, is invariant to mass media hype and significantly reduces the noise introduced by the use of tweets. The correlation coefficient between the Twitter disease distribution obtained via our approach and CDC data from mid-2013 to mid-2014 was 0.983, improving upon the best model published for the 2012-13 flu season. Our model further correlates well with theoretical models of infection spread across airport networks, verifying its robustness and applicability in the public sphere.
RIS
TY - CPAPER TI - Disease Propagation in Social Networks: A Novel Study of Infection Genesis and Spread on Twitter AU - Manan Shah BT - Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016 DA - 2016/12/06 ED - Wei Fan ED - Albert Bifet ED - Jesse Read ED - Qiang Yang ED - Philip S. Yu ID - pmlr-v53-shah16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 53 SP - 85 EP - 102 L1 - http://proceedings.mlr.press/v53/shah16.pdf UR - https://proceedings.mlr.press/v53/shah16.html AB - The CDC (Centers for Disease Control and Prevention) currently diagnoses millions of cases of infectious diseases annually, generating population disease distributions that, while accurate, are far too delayed for real-time monitoring. The ability to instantly compile and monitor such distributions is critical in identifying outbreaks and facilitating real-time communication between health authorities and health-care providers. This task, however, is made challenging due to the lack of instantly available public health information, creating a need for the analysis of disease spread on frequently updated social media websites. We introduce a novel pipeline based model to generate a real-time, accurate depiction of infectious disease propagation using Twitter data. Our approach, an amalgam of natural language processing and supervised machine learning, is invariant to mass media hype and significantly reduces the noise introduced by the use of tweets. The correlation coefficient between the Twitter disease distribution obtained via our approach and CDC data from mid-2013 to mid-2014 was 0.983, improving upon the best model published for the 2012-13 flu season. Our model further correlates well with theoretical models of infection spread across airport networks, verifying its robustness and applicability in the public sphere. ER -
APA
Shah, M.. (2016). Disease Propagation in Social Networks: A Novel Study of Infection Genesis and Spread on Twitter. Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, in Proceedings of Machine Learning Research 53:85-102 Available from https://proceedings.mlr.press/v53/shah16.html.

Related Material