Transferring Knowledge from Text to Predict Disease Onset

Yun Liu; Collin Stultz; John Guttag; Kun-Ta Chuang; Kun-Ta Chuang; Fu-Wen Liang; Huey-Jen Su

Transferring Knowledge from Text to Predict Disease Onset

Yun Liu, Collin Stultz, John Guttag, Kun-Ta Chuang, Kun-Ta Chuang, Fu-Wen Liang, Huey-Jen Su

Proceedings of the 1st Machine Learning for Healthcare Conference, PMLR 56:150-163, 2016.

Abstract

In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature’s text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.

Cite this Paper

BibTeX

@InProceedings{pmlr-v56-Liu16,
  title = 	 {Transferring Knowledge from Text to Predict Disease Onset},
  author = 	 {Liu, Yun and Stultz, Collin and Guttag, John and Chuang, Kun-Ta and Chuang, Kun-Ta and Liang, Fu-Wen and Su, Huey-Jen},
  booktitle = 	 {Proceedings of the 1st Machine Learning for Healthcare Conference},
  pages = 	 {150--163},
  year = 	 {2016},
  editor = 	 {Doshi-Velez, Finale and Fackler, Jim and Kale, David and Wallace, Byron and Wiens, Jenna},
  volume = 	 {56},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Northeastern University, Boston, MA, USA},
  month = 	 {18--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v56/Liu16.pdf},
  url = 	 {https://proceedings.mlr.press/v56/Liu16.html},
  abstract = 	 {In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature’s text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.}
}

Endnote

%0 Conference Paper
%T Transferring Knowledge from Text to Predict Disease Onset
%A Yun Liu
%A Collin Stultz
%A John Guttag
%A Kun-Ta Chuang
%A Kun-Ta Chuang
%A Fu-Wen Liang
%A Huey-Jen Su
%B Proceedings of the 1st Machine Learning for Healthcare Conference
%C Proceedings of Machine Learning Research
%D 2016
%E Finale Doshi-Velez
%E Jim Fackler
%E David Kale
%E Byron Wallace
%E Jenna Wiens	
%F pmlr-v56-Liu16
%I PMLR
%P 150--163
%U https://proceedings.mlr.press/v56/Liu16.html
%V 56
%X In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature’s text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.

RIS

TY  - CPAPER
TI  - Transferring Knowledge from Text to Predict Disease Onset
AU  - Yun Liu
AU  - Collin Stultz
AU  - John Guttag
AU  - Kun-Ta Chuang
AU  - Kun-Ta Chuang
AU  - Fu-Wen Liang
AU  - Huey-Jen Su
BT  - Proceedings of the 1st Machine Learning for Healthcare Conference
DA  - 2016/12/10
ED  - Finale Doshi-Velez
ED  - Jim Fackler
ED  - David Kale
ED  - Byron Wallace
ED  - Jenna Wiens	
ID  - pmlr-v56-Liu16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 56
SP  - 150
EP  - 163
L1  - http://proceedings.mlr.press/v56/Liu16.pdf
UR  - https://proceedings.mlr.press/v56/Liu16.html
AB  - In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature’s text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.
ER  -

APA

Liu, Y., Stultz, C., Guttag, J., Chuang, K., Chuang, K., Liang, F. & Su, H.. (2016). Transferring Knowledge from Text to Predict Disease Onset. Proceedings of the 1st Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 56:150-163 Available from https://proceedings.mlr.press/v56/Liu16.html.

Related Material

Download PDF