[edit]
Detecting Biomedical Named Entities in COVID-19 Texts
Proceedings of the 1st Workshop on Healthcare AI and COVID-19, ICML 2022, PMLR 184:117-126, 2022.
Abstract
The application of the state-of-the-art biomedical named entity recognition task faces a few challenges: first, these methods are trained on a fewer number of clinical entities (e.g., disease, symptom, proteins, genes); second, these methods require a large amount of data for pre-training and prediction, making it difficult to implement them in real-time scenarios; third, these methods do not consider the non-clinical entities such as social determinants of health (age, gender, employment, race) which are also related to patients’ health. We propose a Machine Learning (ML) pipeline that improves on previous efforts in three ways: first, it recognizes many clinical entity types (diseases, symptoms, drugs, diagnosis, etc.), second, this pipeline is easily configurable, reusable and can scale up for training and inference; third, it
considers non-clinical factors related to patient’s health. At a high level, this pipeline consists of stages: pre-processing, tokenization, mapping embedding lookup and named entity recognition task. We also present a new dataset that we prepare by curating the COVID-19 case reports. The proposed approach outperforms baseline methods on
four benchmark datasets with macro-and microaverage F1 scores around 90, as well as using our dataset with a macro-and micro-average F1 score of 95.25 and 93.18 respectively.