Students Need More Attention: BERT-based Attention Model for Small Data with Application to Automatic Patient Message Triage

Shijing Si, Rui Wang, Jedrek Wosik, Hao Zhang, David Dov, Guoyin Wang, Lawrence Carin
Proceedings of the 5th Machine Learning for Healthcare Conference, PMLR 126:436-456, 2020.

Abstract

Small and imbalanced datasets commonly seen in healthcare represent a challenge when training classifiers based on deep learning models. So motivated, we propose a novel framework based on BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical TextMining). Specifically, (i) we introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce over fitting and model size when working on small datasets. As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent. Experiments demonstrate that our approach can outperform several strong baseline classifiers by a significant margin of 4.3% in terms of macro F1 score. The code for this project is publicly available at https://github.com/shijing001/text_classifiers

Cite this Paper


BibTeX
@InProceedings{pmlr-v126-si20a, title = {Students Need More Attention: BERT-based Attention Model for Small Data with Application to Automatic Patient Message Triage}, author = {Si, Shijing and Wang, Rui and Wosik, Jedrek and Zhang, Hao and Dov, David and Wang, Guoyin and Carin, Lawrence}, booktitle = {Proceedings of the 5th Machine Learning for Healthcare Conference}, pages = {436--456}, year = {2020}, editor = {Doshi-Velez, Finale and Fackler, Jim and Jung, Ken and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna}, volume = {126}, series = {Proceedings of Machine Learning Research}, month = {07--08 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v126/si20a/si20a.pdf}, url = {https://proceedings.mlr.press/v126/si20a.html}, abstract = {Small and imbalanced datasets commonly seen in healthcare represent a challenge when training classifiers based on deep learning models. So motivated, we propose a novel framework based on BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical TextMining). Specifically, (i) we introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce over fitting and model size when working on small datasets. As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent. Experiments demonstrate that our approach can outperform several strong baseline classifiers by a significant margin of 4.3% in terms of macro F1 score. The code for this project is publicly available at https://github.com/shijing001/text_classifiers} }
Endnote
%0 Conference Paper %T Students Need More Attention: BERT-based Attention Model for Small Data with Application to Automatic Patient Message Triage %A Shijing Si %A Rui Wang %A Jedrek Wosik %A Hao Zhang %A David Dov %A Guoyin Wang %A Lawrence Carin %B Proceedings of the 5th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2020 %E Finale Doshi-Velez %E Jim Fackler %E Ken Jung %E David Kale %E Rajesh Ranganath %E Byron Wallace %E Jenna Wiens %F pmlr-v126-si20a %I PMLR %P 436--456 %U https://proceedings.mlr.press/v126/si20a.html %V 126 %X Small and imbalanced datasets commonly seen in healthcare represent a challenge when training classifiers based on deep learning models. So motivated, we propose a novel framework based on BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical TextMining). Specifically, (i) we introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce over fitting and model size when working on small datasets. As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent. Experiments demonstrate that our approach can outperform several strong baseline classifiers by a significant margin of 4.3% in terms of macro F1 score. The code for this project is publicly available at https://github.com/shijing001/text_classifiers
APA
Si, S., Wang, R., Wosik, J., Zhang, H., Dov, D., Wang, G. & Carin, L.. (2020). Students Need More Attention: BERT-based Attention Model for Small Data with Application to Automatic Patient Message Triage. Proceedings of the 5th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 126:436-456 Available from https://proceedings.mlr.press/v126/si20a.html.

Related Material