Training without training data: Improving the generalizability of automated medical abbreviation disambiguation

Marta Skreta, Aryan Arbabi, Jixuan Wang, Michael Brudno
Proceedings of the Machine Learning for Health NeurIPS Workshop, PMLR 116:233-245, 2020.

Abstract

Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model’s ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model’s representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14{%} on the CASI dataset and 4{%} on i2b2.

Cite this Paper


BibTeX
@InProceedings{pmlr-v116-skreta20a, title = {{Training without training data: Improving the generalizability of automated medical abbreviation disambiguation}}, author = {Skreta, Marta and Arbabi, Aryan and Wang, Jixuan and Brudno, Michael}, booktitle = {Proceedings of the Machine Learning for Health NeurIPS Workshop}, pages = {233--245}, year = {2020}, editor = {Dalca, Adrian V. and McDermott, Matthew B.A. and Alsentzer, Emily and Finlayson, Samuel G. and Oberst, Michael and Falck, Fabian and Beaulieu-Jones, Brett}, volume = {116}, series = {Proceedings of Machine Learning Research}, month = {13 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v116/skreta20a/skreta20a.pdf}, url = {https://proceedings.mlr.press/v116/skreta20a.html}, abstract = {Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model’s ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model’s representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14{%} on the CASI dataset and 4{%} on i2b2.} }
Endnote
%0 Conference Paper %T Training without training data: Improving the generalizability of automated medical abbreviation disambiguation %A Marta Skreta %A Aryan Arbabi %A Jixuan Wang %A Michael Brudno %B Proceedings of the Machine Learning for Health NeurIPS Workshop %C Proceedings of Machine Learning Research %D 2020 %E Adrian V. Dalca %E Matthew B.A. McDermott %E Emily Alsentzer %E Samuel G. Finlayson %E Michael Oberst %E Fabian Falck %E Brett Beaulieu-Jones %F pmlr-v116-skreta20a %I PMLR %P 233--245 %U https://proceedings.mlr.press/v116/skreta20a.html %V 116 %X Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model’s ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model’s representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14{%} on the CASI dataset and 4{%} on i2b2.
APA
Skreta, M., Arbabi, A., Wang, J. & Brudno, M.. (2020). Training without training data: Improving the generalizability of automated medical abbreviation disambiguation. Proceedings of the Machine Learning for Health NeurIPS Workshop, in Proceedings of Machine Learning Research 116:233-245 Available from https://proceedings.mlr.press/v116/skreta20a.html.

Related Material