Medical Concept Normalization by Encoding Target Knowledge

Nikhil Pattisapu; Sangameshwar Patil; Girish Palshikar; Vasudeva Varma

Medical Concept Normalization by Encoding Target Knowledge

Nikhil Pattisapu, Sangameshwar Patil, Girish Palshikar, Vasudeva Varma

Proceedings of the Machine Learning for Health NeurIPS Workshop, PMLR 116:246-259, 2020.

Abstract

Medical concept normalization aims to map a variable length message such as, ’unable to sleep’ to an entry in a target medical lexicon, such as ’Insomnia’. Current approaches formulate medical concept normalization as a supervised text classification problem. This formulation has several drawbacks. First, creating training data requires manually mapping medical concept mentions to their corresponding entries in a target lexicon. Second, these models fail to map a mention to the target concepts which were not encountered during the training phase. Lastly, these models have to be retrained from scratch whenever new concepts are added to the target lexicon. In this work we propose a method which overcomes these limitations. We first use various text and graph embedding methods to encode medical concepts into an embedding space. We then train a model which transforms concept mentions into vectors in this target embedding space. Finally, we use cosine similarity to find the nearest medical concept to a given input medical concept mention. Our model scales to millions of target concepts and trivially accommodates growing target lexicon size without incurring significant computational cost. Experimental results show that our model outperforms the previous state-of-the-art by 4.2{%} and 6.3{%} classification accuracy across two benchmark datasets. We also present a variety of studies to evaluate the robustness of our model under different training conditions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v116-pattisapu20a,
  title = 	 {{Medical Concept Normalization by Encoding Target Knowledge}},
  author =       {Pattisapu, Nikhil and Patil, Sangameshwar and Palshikar, Girish and Varma, Vasudeva},
  booktitle = 	 {Proceedings of the Machine Learning for Health NeurIPS Workshop},
  pages = 	 {246--259},
  year = 	 {2020},
  editor = 	 {Dalca, Adrian V. and McDermott, Matthew B.A. and Alsentzer, Emily and Finlayson, Samuel G. and Oberst, Michael and Falck, Fabian and Beaulieu-Jones, Brett},
  volume = 	 {116},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13 Dec},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v116/pattisapu20a/pattisapu20a.pdf},
  url = 	 {https://proceedings.mlr.press/v116/pattisapu20a.html},
  abstract = 	 {Medical concept normalization aims to map a variable length message such as, ’unable to sleep’ to an entry in a target medical lexicon, such as ’Insomnia’. Current approaches formulate medical concept normalization as a supervised text classification problem. This formulation has several drawbacks. First, creating training data requires manually mapping medical concept mentions to their corresponding entries in a target lexicon. Second, these models fail to map a mention to the target concepts which were not encountered during the training phase. Lastly, these models have to be retrained from scratch whenever new concepts are added to the target lexicon. In this work we propose a method which overcomes these limitations. We first use various text and graph embedding methods to encode medical concepts into an embedding space. We then train a model which transforms concept mentions into vectors in this target embedding space. Finally, we use cosine similarity to find the nearest medical concept to a given input medical concept mention. Our model scales to millions of target concepts and trivially accommodates growing target lexicon size without incurring significant computational cost. Experimental results show that our model outperforms the previous state-of-the-art by 4.2{%} and 6.3{%} classification accuracy across two benchmark datasets. We also present a variety of studies to evaluate the robustness of our model under different training conditions.}
}

Endnote

%0 Conference Paper
%T Medical Concept Normalization by Encoding Target Knowledge
%A Nikhil Pattisapu
%A Sangameshwar Patil
%A Girish Palshikar
%A Vasudeva Varma
%B Proceedings of the Machine Learning for Health NeurIPS Workshop
%C Proceedings of Machine Learning Research
%D 2020
%E Adrian V. Dalca
%E Matthew B.A. McDermott
%E Emily Alsentzer
%E Samuel G. Finlayson
%E Michael Oberst
%E Fabian Falck
%E Brett Beaulieu-Jones	
%F pmlr-v116-pattisapu20a
%I PMLR
%P 246--259
%U https://proceedings.mlr.press/v116/pattisapu20a.html
%V 116
%X Medical concept normalization aims to map a variable length message such as, ’unable to sleep’ to an entry in a target medical lexicon, such as ’Insomnia’. Current approaches formulate medical concept normalization as a supervised text classification problem. This formulation has several drawbacks. First, creating training data requires manually mapping medical concept mentions to their corresponding entries in a target lexicon. Second, these models fail to map a mention to the target concepts which were not encountered during the training phase. Lastly, these models have to be retrained from scratch whenever new concepts are added to the target lexicon. In this work we propose a method which overcomes these limitations. We first use various text and graph embedding methods to encode medical concepts into an embedding space. We then train a model which transforms concept mentions into vectors in this target embedding space. Finally, we use cosine similarity to find the nearest medical concept to a given input medical concept mention. Our model scales to millions of target concepts and trivially accommodates growing target lexicon size without incurring significant computational cost. Experimental results show that our model outperforms the previous state-of-the-art by 4.2{%} and 6.3{%} classification accuracy across two benchmark datasets. We also present a variety of studies to evaluate the robustness of our model under different training conditions.

APA


Pattisapu, N., Patil, S., Palshikar, G. & Varma, V.. (2020). Medical Concept Normalization by Encoding Target Knowledge. Proceedings of the Machine Learning for Health NeurIPS Workshop, in Proceedings of Machine Learning Research 116:246-259 Available from https://proceedings.mlr.press/v116/pattisapu20a.html.

Related Material

Download PDF