Transferring Knowledge from Text to Predict Disease Onset

Yun Liu, Collin Stultz, John Guttag, Kun-Ta Chuang, Kun-Ta Chuang, Fu-Wen Liang, Huey-Jen Su
Proceedings of the 1st Machine Learning for Healthcare Conference, PMLR 56:150-163, 2016.

Abstract

In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature’s text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.

Cite this Paper


BibTeX
@InProceedings{pmlr-v56-Liu16, title = {Transferring Knowledge from Text to Predict Disease Onset}, author = {Liu, Yun and Stultz, Collin and Guttag, John and Chuang, Kun-Ta and Chuang, Kun-Ta and Liang, Fu-Wen and Su, Huey-Jen}, booktitle = {Proceedings of the 1st Machine Learning for Healthcare Conference}, pages = {150--163}, year = {2016}, editor = {Doshi-Velez, Finale and Fackler, Jim and Kale, David and Wallace, Byron and Wiens, Jenna}, volume = {56}, series = {Proceedings of Machine Learning Research}, address = {Northeastern University, Boston, MA, USA}, month = {18--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v56/Liu16.pdf}, url = {https://proceedings.mlr.press/v56/Liu16.html}, abstract = {In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature’s text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.} }
Endnote
%0 Conference Paper %T Transferring Knowledge from Text to Predict Disease Onset %A Yun Liu %A Collin Stultz %A John Guttag %A Kun-Ta Chuang %A Kun-Ta Chuang %A Fu-Wen Liang %A Huey-Jen Su %B Proceedings of the 1st Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2016 %E Finale Doshi-Velez %E Jim Fackler %E David Kale %E Byron Wallace %E Jenna Wiens %F pmlr-v56-Liu16 %I PMLR %P 150--163 %U https://proceedings.mlr.press/v56/Liu16.html %V 56 %X In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature’s text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.
RIS
TY - CPAPER TI - Transferring Knowledge from Text to Predict Disease Onset AU - Yun Liu AU - Collin Stultz AU - John Guttag AU - Kun-Ta Chuang AU - Kun-Ta Chuang AU - Fu-Wen Liang AU - Huey-Jen Su BT - Proceedings of the 1st Machine Learning for Healthcare Conference DA - 2016/12/10 ED - Finale Doshi-Velez ED - Jim Fackler ED - David Kale ED - Byron Wallace ED - Jenna Wiens ID - pmlr-v56-Liu16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 56 SP - 150 EP - 163 L1 - http://proceedings.mlr.press/v56/Liu16.pdf UR - https://proceedings.mlr.press/v56/Liu16.html AB - In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature’s text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available. ER -
APA
Liu, Y., Stultz, C., Guttag, J., Chuang, K., Chuang, K., Liang, F. & Su, H.. (2016). Transferring Knowledge from Text to Predict Disease Onset. Proceedings of the 1st Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 56:150-163 Available from https://proceedings.mlr.press/v56/Liu16.html.

Related Material