GIIM: A Graph Information Integration Method for Chinese-Kazakh CLIR

Ping Hu, He Yang, Changle Yin, Tao Wang, Yuchao Chen
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:383-398, 2025.

Abstract

Chinese-Kazakh cross-lingual information retrieval (CLIR) aims to search relevant content from a collection of Kazakh documents using Chinese query statements. The intrinsic differences in grammar, vocabulary, and semantic expression between languages pose significant challenges for semantic alignment in CLIR. Existing CLIR methods that incorporate multilingual knowledge graph (MLKG) typically use simple vector stacking approaches to integrate entity information, failing to leverage deeper entity relationships and semantic connections. To address these challenges, we propose GIIM, a graph information integration method for Chinese-Kazakh CLIR that leverages the rich multilingual entity information embedded in MLKG as semantic bridges to narrow the linguistic gap during query-document matching process. Unlike previous methods, GIIM unifies query-document pairs and entity information into a graph structure and employs Graph Convolutional Network to aggregate both direct and multi-hop relations among entities, effectively modeling complex semantic paths and hierarchical knowledge propagation. To comprehensively evaluate GIIM, we construct CKIRD, a Chinese-Kazakh information retrieval dataset containing approximately 11,820 annotated query-paragraph pairs, and conduct experiments on both CKIRD and the public CLIRMatrix datasets. Experimental results show that GIIM outperforms existing baseline models across multiple ranking metrics, demonstrating its effectiveness on the Chinese-Kazakh CLIR task.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-hu25b, title = {GIIM: A Graph Information Integration Method for Chinese-Kazakh CLIR}, author = {Hu, Ping and Yang, He and Yin, Changle and Wang, Tao and Chen, Yuchao}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {383--398}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/hu25b/hu25b.pdf}, url = {https://proceedings.mlr.press/v304/hu25b.html}, abstract = {Chinese-Kazakh cross-lingual information retrieval (CLIR) aims to search relevant content from a collection of Kazakh documents using Chinese query statements. The intrinsic differences in grammar, vocabulary, and semantic expression between languages pose significant challenges for semantic alignment in CLIR. Existing CLIR methods that incorporate multilingual knowledge graph (MLKG) typically use simple vector stacking approaches to integrate entity information, failing to leverage deeper entity relationships and semantic connections. To address these challenges, we propose GIIM, a graph information integration method for Chinese-Kazakh CLIR that leverages the rich multilingual entity information embedded in MLKG as semantic bridges to narrow the linguistic gap during query-document matching process. Unlike previous methods, GIIM unifies query-document pairs and entity information into a graph structure and employs Graph Convolutional Network to aggregate both direct and multi-hop relations among entities, effectively modeling complex semantic paths and hierarchical knowledge propagation. To comprehensively evaluate GIIM, we construct CKIRD, a Chinese-Kazakh information retrieval dataset containing approximately 11,820 annotated query-paragraph pairs, and conduct experiments on both CKIRD and the public CLIRMatrix datasets. Experimental results show that GIIM outperforms existing baseline models across multiple ranking metrics, demonstrating its effectiveness on the Chinese-Kazakh CLIR task.} }
Endnote
%0 Conference Paper %T GIIM: A Graph Information Integration Method for Chinese-Kazakh CLIR %A Ping Hu %A He Yang %A Changle Yin %A Tao Wang %A Yuchao Chen %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-hu25b %I PMLR %P 383--398 %U https://proceedings.mlr.press/v304/hu25b.html %V 304 %X Chinese-Kazakh cross-lingual information retrieval (CLIR) aims to search relevant content from a collection of Kazakh documents using Chinese query statements. The intrinsic differences in grammar, vocabulary, and semantic expression between languages pose significant challenges for semantic alignment in CLIR. Existing CLIR methods that incorporate multilingual knowledge graph (MLKG) typically use simple vector stacking approaches to integrate entity information, failing to leverage deeper entity relationships and semantic connections. To address these challenges, we propose GIIM, a graph information integration method for Chinese-Kazakh CLIR that leverages the rich multilingual entity information embedded in MLKG as semantic bridges to narrow the linguistic gap during query-document matching process. Unlike previous methods, GIIM unifies query-document pairs and entity information into a graph structure and employs Graph Convolutional Network to aggregate both direct and multi-hop relations among entities, effectively modeling complex semantic paths and hierarchical knowledge propagation. To comprehensively evaluate GIIM, we construct CKIRD, a Chinese-Kazakh information retrieval dataset containing approximately 11,820 annotated query-paragraph pairs, and conduct experiments on both CKIRD and the public CLIRMatrix datasets. Experimental results show that GIIM outperforms existing baseline models across multiple ranking metrics, demonstrating its effectiveness on the Chinese-Kazakh CLIR task.
APA
Hu, P., Yang, H., Yin, C., Wang, T. & Chen, Y.. (2025). GIIM: A Graph Information Integration Method for Chinese-Kazakh CLIR. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:383-398 Available from https://proceedings.mlr.press/v304/hu25b.html.

Related Material