Chinese Named Entity Recognition Method Based on Lexicon and Convolution-Integrated Self-Attention

Xu Wenjie, Gao Maoting
Proceedings of 2024 International Conference on Machine Learning and Intelligent Computing, PMLR 245:147-154, 2024.

Abstract

Chinese Named Entity Recognition methods primarily consist of span-based approaches and sequence-to-sequence methods. However, the former focuses solely on the recognition of entity boundaries, while the latter is susceptible to exposure bias. To address these issues, a Chinese NER method based on dictionary enhancement and self-attention fusion convolution is proposed. Initially, the text is encoded using the pre-trained model ERNIE 3.0 and lexical representations. Then, Bi-LSTM is utilized to further capture the contextual information of the sequence, resulting in the final character representations. Subsequently, a two-dimensional (2D) grid is constructed for modeling character pairs, and a feature integration layer is developed by merging self-attention mechanisms and convolution to refine and capture the interactions between characters. Finally, a joint predictor composed of a dual-affine classifier and multilayer perceptrons is used to predict entity categories. Experimental results demonstrate that this method can effectively recognize both flat and nested named entities. Compared to the current best-performing baseline models, the proposed method achieves an increase of 0.14% and 2.53% in F1 scores on the flat datasets Resume and Weibo, respectively, and an improvement of 0.52% in F1 score on the nested dataset ACE2005.

Cite this Paper


BibTeX
@InProceedings{pmlr-v245-wenjie24a, title = {Chinese Named Entity Recognition Method Based on Lexicon and Convolution-Integrated Self-Attention}, author = {Wenjie, Xu and Maoting, Gao}, booktitle = {Proceedings of 2024 International Conference on Machine Learning and Intelligent Computing}, pages = {147--154}, year = {2024}, editor = {Nianyin, Zeng and Pachori, Ram Bilas}, volume = {245}, series = {Proceedings of Machine Learning Research}, month = {26--28 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v245/main/assets/wenjie24a/wenjie24a.pdf}, url = {https://proceedings.mlr.press/v245/wenjie24a.html}, abstract = {Chinese Named Entity Recognition methods primarily consist of span-based approaches and sequence-to-sequence methods. However, the former focuses solely on the recognition of entity boundaries, while the latter is susceptible to exposure bias. To address these issues, a Chinese NER method based on dictionary enhancement and self-attention fusion convolution is proposed. Initially, the text is encoded using the pre-trained model ERNIE 3.0 and lexical representations. Then, Bi-LSTM is utilized to further capture the contextual information of the sequence, resulting in the final character representations. Subsequently, a two-dimensional (2D) grid is constructed for modeling character pairs, and a feature integration layer is developed by merging self-attention mechanisms and convolution to refine and capture the interactions between characters. Finally, a joint predictor composed of a dual-affine classifier and multilayer perceptrons is used to predict entity categories. Experimental results demonstrate that this method can effectively recognize both flat and nested named entities. Compared to the current best-performing baseline models, the proposed method achieves an increase of 0.14% and 2.53% in F1 scores on the flat datasets Resume and Weibo, respectively, and an improvement of 0.52% in F1 score on the nested dataset ACE2005. } }
Endnote
%0 Conference Paper %T Chinese Named Entity Recognition Method Based on Lexicon and Convolution-Integrated Self-Attention %A Xu Wenjie %A Gao Maoting %B Proceedings of 2024 International Conference on Machine Learning and Intelligent Computing %C Proceedings of Machine Learning Research %D 2024 %E Zeng Nianyin %E Ram Bilas Pachori %F pmlr-v245-wenjie24a %I PMLR %P 147--154 %U https://proceedings.mlr.press/v245/wenjie24a.html %V 245 %X Chinese Named Entity Recognition methods primarily consist of span-based approaches and sequence-to-sequence methods. However, the former focuses solely on the recognition of entity boundaries, while the latter is susceptible to exposure bias. To address these issues, a Chinese NER method based on dictionary enhancement and self-attention fusion convolution is proposed. Initially, the text is encoded using the pre-trained model ERNIE 3.0 and lexical representations. Then, Bi-LSTM is utilized to further capture the contextual information of the sequence, resulting in the final character representations. Subsequently, a two-dimensional (2D) grid is constructed for modeling character pairs, and a feature integration layer is developed by merging self-attention mechanisms and convolution to refine and capture the interactions between characters. Finally, a joint predictor composed of a dual-affine classifier and multilayer perceptrons is used to predict entity categories. Experimental results demonstrate that this method can effectively recognize both flat and nested named entities. Compared to the current best-performing baseline models, the proposed method achieves an increase of 0.14% and 2.53% in F1 scores on the flat datasets Resume and Weibo, respectively, and an improvement of 0.52% in F1 score on the nested dataset ACE2005.
APA
Wenjie, X. & Maoting, G.. (2024). Chinese Named Entity Recognition Method Based on Lexicon and Convolution-Integrated Self-Attention. Proceedings of 2024 International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 245:147-154 Available from https://proceedings.mlr.press/v245/wenjie24a.html.

Related Material