[edit]
A Knowledge Augmented Framework for Multimodal News\{Object-Entity Relation Extraction
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:779-788, 2025.
Abstract
Multimodal relation extraction, as an important research direction in the field of information extraction, aims to identify entities and objects from both text and images and establish cross-modal semantic associations. Current mainstream methods still face challenges in handling complex multimodal data, such as semantic alignment confusion and redundant associations, which lead to erroneous associations between irrelevant entities and objects, severely affecting system performance. To address this issue, this paper proposes a multimodal relation extraction framework that integrates knowledge graphs. This approach uses the knowledge graph as external semantic support to filter candidate entity-object pairs through structured semantic information, and leverages a multimodal alignment module to achieve precise semantic matching. Experimental results show that this method significantly outperforms existing methods on multiple benchmark datasets, especially in fine-grained relation recognition, where the F1 score increases by 4 percentage points, effectively demonstrating the framework’s ability to mitigate cross-modal noise interference.