Improved Clinical Abbreviation Expansion via Non-Sense-Based Approaches

Juyong Kim, Linyuan Gong, Justin Khim, Jeremy C. Weiss, Pradeep Ravikumar
Proceedings of the Machine Learning for Health NeurIPS Workshop, PMLR 136:161-178, 2020.

Abstract

Abbreviation expansion is an important problem in clinical natural language processing because abbreviations often occur in text notes in medical records, and expansions of these abbreviations are critical for downstream applications such as assistive diagnosis and insurance code review. Previous studies have treated abbreviation expansion as a special case of word sense disambiguation; however, abbreviation expansion is easier because we only need the character level expansion and not necessarily the full sense of the abbreviation. In particular, such character level expansions may naturally occur elsewhere in medical contexts. Accordingly, we consider two categories of methods for abbreviation expansion: (a) non-sense-based methods that use information solely at lexical levels using state-of-the-art language models, and (b) sense-based methods that also incorporate sense information, such as glosses, from knowledge bases, to simultaneously perform the two tasks of expansion and disambiguation of the abbreviation. We propose two language model based approaches, including a novel length-agnostic permutation language model, find non-sense methods to be more effective than sense-based methods, and achieve the state-of-theart on three clinical datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v136-kim20a, title = {Improved Clinical Abbreviation Expansion via Non-Sense-Based Approaches}, author = {Kim, Juyong and Gong, Linyuan and Khim, Justin and Weiss, Jeremy C. and Ravikumar, Pradeep}, booktitle = {Proceedings of the Machine Learning for Health NeurIPS Workshop}, pages = {161--178}, year = {2020}, editor = {Alsentzer, Emily and McDermott, Matthew B. A. and Falck, Fabian and Sarkar, Suproteem K. and Roy, Subhrajit and Hyland, Stephanie L.}, volume = {136}, series = {Proceedings of Machine Learning Research}, month = {11 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v136/kim20a/kim20a.pdf}, url = {https://proceedings.mlr.press/v136/kim20a.html}, abstract = {Abbreviation expansion is an important problem in clinical natural language processing because abbreviations often occur in text notes in medical records, and expansions of these abbreviations are critical for downstream applications such as assistive diagnosis and insurance code review. Previous studies have treated abbreviation expansion as a special case of word sense disambiguation; however, abbreviation expansion is easier because we only need the character level expansion and not necessarily the full sense of the abbreviation. In particular, such character level expansions may naturally occur elsewhere in medical contexts. Accordingly, we consider two categories of methods for abbreviation expansion: (a) non-sense-based methods that use information solely at lexical levels using state-of-the-art language models, and (b) sense-based methods that also incorporate sense information, such as glosses, from knowledge bases, to simultaneously perform the two tasks of expansion and disambiguation of the abbreviation. We propose two language model based approaches, including a novel length-agnostic permutation language model, find non-sense methods to be more effective than sense-based methods, and achieve the state-of-theart on three clinical datasets.} }
Endnote
%0 Conference Paper %T Improved Clinical Abbreviation Expansion via Non-Sense-Based Approaches %A Juyong Kim %A Linyuan Gong %A Justin Khim %A Jeremy C. Weiss %A Pradeep Ravikumar %B Proceedings of the Machine Learning for Health NeurIPS Workshop %C Proceedings of Machine Learning Research %D 2020 %E Emily Alsentzer %E Matthew B. A. McDermott %E Fabian Falck %E Suproteem K. Sarkar %E Subhrajit Roy %E Stephanie L. Hyland %F pmlr-v136-kim20a %I PMLR %P 161--178 %U https://proceedings.mlr.press/v136/kim20a.html %V 136 %X Abbreviation expansion is an important problem in clinical natural language processing because abbreviations often occur in text notes in medical records, and expansions of these abbreviations are critical for downstream applications such as assistive diagnosis and insurance code review. Previous studies have treated abbreviation expansion as a special case of word sense disambiguation; however, abbreviation expansion is easier because we only need the character level expansion and not necessarily the full sense of the abbreviation. In particular, such character level expansions may naturally occur elsewhere in medical contexts. Accordingly, we consider two categories of methods for abbreviation expansion: (a) non-sense-based methods that use information solely at lexical levels using state-of-the-art language models, and (b) sense-based methods that also incorporate sense information, such as glosses, from knowledge bases, to simultaneously perform the two tasks of expansion and disambiguation of the abbreviation. We propose two language model based approaches, including a novel length-agnostic permutation language model, find non-sense methods to be more effective than sense-based methods, and achieve the state-of-theart on three clinical datasets.
APA
Kim, J., Gong, L., Khim, J., Weiss, J.C. & Ravikumar, P.. (2020). Improved Clinical Abbreviation Expansion via Non-Sense-Based Approaches. Proceedings of the Machine Learning for Health NeurIPS Workshop, in Proceedings of Machine Learning Research 136:161-178 Available from https://proceedings.mlr.press/v136/kim20a.html.

Related Material