Semi-Open Attribute Extraction from Chinese Functional Description Text

Li Zhang, Yanzeng Li, Rouyu Zhang, Wenjie Li
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1505-1520, 2021.

Abstract

Attribute extraction is a task to identify the attribute and the corresponding attribute value from unstructured text, which is important for extensive applications like web information retrieval and the recommended system. The traditional relation extraction-based methods or joint extraction-based systems are often perform attribute classify based on subject and attribute-value pairs, and extract the attribute triples in the scope of ontology schema categories, which is in the assumption of the close-world and cannot satisfy the diversity of attributes. In this work, we propose a semi-open information extraction system for attribute extraction in a multi-component framework. With the proposed semi-open attribute extraction system (SOAE), more attribute-value pairs can be discovered by extracting literal triples without the limitation of pre-defined ontology. An additional co-trained ontology-based attribute extraction model is appended as a component following the assumption of the partial-closed world (PCWA), remission the performance degradation of SOAE caused by missing of the literal predicate in raw text and contribute to extract richer attribute triples and construct more dense knowledge graph. For evaluating the performance of the attribute extraction system, we construct a Chinese functional description text dataset CNShipNet and conduct experiments on it. The experimental results demonstrate that our proposed approach outperforms several state-of-the-art baselines with a large margin.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-zhang21f, title = {Semi-Open Attribute Extraction from Chinese Functional Description Text}, author = {Zhang, Li and Li, Yanzeng and Zhang, Rouyu and Li, Wenjie}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {1505--1520}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/zhang21f/zhang21f.pdf}, url = {https://proceedings.mlr.press/v157/zhang21f.html}, abstract = {Attribute extraction is a task to identify the attribute and the corresponding attribute value from unstructured text, which is important for extensive applications like web information retrieval and the recommended system. The traditional relation extraction-based methods or joint extraction-based systems are often perform attribute classify based on subject and attribute-value pairs, and extract the attribute triples in the scope of ontology schema categories, which is in the assumption of the close-world and cannot satisfy the diversity of attributes. In this work, we propose a semi-open information extraction system for attribute extraction in a multi-component framework. With the proposed semi-open attribute extraction system (SOAE), more attribute-value pairs can be discovered by extracting literal triples without the limitation of pre-defined ontology. An additional co-trained ontology-based attribute extraction model is appended as a component following the assumption of the partial-closed world (PCWA), remission the performance degradation of SOAE caused by missing of the literal predicate in raw text and contribute to extract richer attribute triples and construct more dense knowledge graph. For evaluating the performance of the attribute extraction system, we construct a Chinese functional description text dataset CNShipNet and conduct experiments on it. The experimental results demonstrate that our proposed approach outperforms several state-of-the-art baselines with a large margin.} }
Endnote
%0 Conference Paper %T Semi-Open Attribute Extraction from Chinese Functional Description Text %A Li Zhang %A Yanzeng Li %A Rouyu Zhang %A Wenjie Li %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-zhang21f %I PMLR %P 1505--1520 %U https://proceedings.mlr.press/v157/zhang21f.html %V 157 %X Attribute extraction is a task to identify the attribute and the corresponding attribute value from unstructured text, which is important for extensive applications like web information retrieval and the recommended system. The traditional relation extraction-based methods or joint extraction-based systems are often perform attribute classify based on subject and attribute-value pairs, and extract the attribute triples in the scope of ontology schema categories, which is in the assumption of the close-world and cannot satisfy the diversity of attributes. In this work, we propose a semi-open information extraction system for attribute extraction in a multi-component framework. With the proposed semi-open attribute extraction system (SOAE), more attribute-value pairs can be discovered by extracting literal triples without the limitation of pre-defined ontology. An additional co-trained ontology-based attribute extraction model is appended as a component following the assumption of the partial-closed world (PCWA), remission the performance degradation of SOAE caused by missing of the literal predicate in raw text and contribute to extract richer attribute triples and construct more dense knowledge graph. For evaluating the performance of the attribute extraction system, we construct a Chinese functional description text dataset CNShipNet and conduct experiments on it. The experimental results demonstrate that our proposed approach outperforms several state-of-the-art baselines with a large margin.
APA
Zhang, L., Li, Y., Zhang, R. & Li, W.. (2021). Semi-Open Attribute Extraction from Chinese Functional Description Text. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:1505-1520 Available from https://proceedings.mlr.press/v157/zhang21f.html.

Related Material