A Knowledge Augmented Framework for Multimodal News\{Object-Entity Relation Extraction

PeiLing Li, Lin Li
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:779-788, 2025.

Abstract

Multimodal relation extraction, as an important research direction in the field of information extraction, aims to identify entities and objects from both text and images and establish cross-modal semantic associations. Current mainstream methods still face challenges in handling complex multimodal data, such as semantic alignment confusion and redundant associations, which lead to erroneous associations between irrelevant entities and objects, severely affecting system performance. To address this issue, this paper proposes a multimodal relation extraction framework that integrates knowledge graphs. This approach uses the knowledge graph as external semantic support to filter candidate entity-object pairs through structured semantic information, and leverages a multimodal alignment module to achieve precise semantic matching. Experimental results show that this method significantly outperforms existing methods on multiple benchmark datasets, especially in fine-grained relation recognition, where the F1 score increases by 4 percentage points, effectively demonstrating the framework’s ability to mitigate cross-modal noise interference.

Cite this Paper


BibTeX
@InProceedings{pmlr-v278-li25m, title = {A Knowledge Augmented Framework for Multimodal News\\{Object}-Entity Relation Extraction}, author = {Li, PeiLing and Li, Lin}, booktitle = {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing}, pages = {779--788}, year = {2025}, editor = {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu}, volume = {278}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v278/main/assets/li25m/li25m.pdf}, url = {https://proceedings.mlr.press/v278/li25m.html}, abstract = {Multimodal relation extraction, as an important research direction in the field of information extraction, aims to identify entities and objects from both text and images and establish cross-modal semantic associations. Current mainstream methods still face challenges in handling complex multimodal data, such as semantic alignment confusion and redundant associations, which lead to erroneous associations between irrelevant entities and objects, severely affecting system performance. To address this issue, this paper proposes a multimodal relation extraction framework that integrates knowledge graphs. This approach uses the knowledge graph as external semantic support to filter candidate entity-object pairs through structured semantic information, and leverages a multimodal alignment module to achieve precise semantic matching. Experimental results show that this method significantly outperforms existing methods on multiple benchmark datasets, especially in fine-grained relation recognition, where the F1 score increases by 4 percentage points, effectively demonstrating the framework’s ability to mitigate cross-modal noise interference.} }
Endnote
%0 Conference Paper %T A Knowledge Augmented Framework for Multimodal News\{Object-Entity Relation Extraction %A PeiLing Li %A Lin Li %B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing %C Proceedings of Machine Learning Research %D 2025 %E Nianyin Zeng %E Ram Bilas Pachori %E Dongshu Wang %F pmlr-v278-li25m %I PMLR %P 779--788 %U https://proceedings.mlr.press/v278/li25m.html %V 278 %X Multimodal relation extraction, as an important research direction in the field of information extraction, aims to identify entities and objects from both text and images and establish cross-modal semantic associations. Current mainstream methods still face challenges in handling complex multimodal data, such as semantic alignment confusion and redundant associations, which lead to erroneous associations between irrelevant entities and objects, severely affecting system performance. To address this issue, this paper proposes a multimodal relation extraction framework that integrates knowledge graphs. This approach uses the knowledge graph as external semantic support to filter candidate entity-object pairs through structured semantic information, and leverages a multimodal alignment module to achieve precise semantic matching. Experimental results show that this method significantly outperforms existing methods on multiple benchmark datasets, especially in fine-grained relation recognition, where the F1 score increases by 4 percentage points, effectively demonstrating the framework’s ability to mitigate cross-modal noise interference.
APA
Li, P. & Li, L.. (2025). A Knowledge Augmented Framework for Multimodal News\{Object-Entity Relation Extraction. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:779-788 Available from https://proceedings.mlr.press/v278/li25m.html.

Related Material