MLCL: A Framework for Reducing Language Imbalance in Sino-Tibetan Languages through Adapter Structures

JiaJun Fang, Wentao Huang, Aimin Yang, Dong Zhou, Nankai Lin
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:431-446, 2025.

Abstract

Multilingual pre-trained models have been widely applied in natural language processing (NLP) tasks, including text classification. However, due to the varying amounts of language resources, these models exhibit performance imbalance across different languages, a phenomenon known as language imbalance. Existing research on mitigating language imbalance primarily harnesses text and image data, neglecting the auditory aspects of languages. This neglect results in an incomplete solution to language imbalance, as it fails to exploit the rich linguistic nuances conveyed through speech. To address these issues, this paper introduces a novel framework called MultiLingual Contrastive Learning (MLCL) to reduce language imbalance. By incorporating concepts from comparative linguistics into neural networks, we utilize the phonetic similarities among languages within the Sino-Tibetan family to tackle the problem of language imbalance in multilingual pre-trained models. To evaluate our method’s effectiveness, we conducted tests using two synthetic datasets derived from the Flores200 and mms datasets across various models. The experimental results show that, in terms of language imbalance metrics, our model surpasses all baseline models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v260-fang25a, title = {{MLCL}: {A} Framework for Reducing Language Imbalance in Sino-Tibetan Languages through Adapter Structures}, author = {Fang, JiaJun and Huang, Wentao and Yang, Aimin and Zhou, Dong and Lin, Nankai}, booktitle = {Proceedings of the 16th Asian Conference on Machine Learning}, pages = {431--446}, year = {2025}, editor = {Nguyen, Vu and Lin, Hsuan-Tien}, volume = {260}, series = {Proceedings of Machine Learning Research}, month = {05--08 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v260/main/assets/fang25a/fang25a.pdf}, url = {https://proceedings.mlr.press/v260/fang25a.html}, abstract = {Multilingual pre-trained models have been widely applied in natural language processing (NLP) tasks, including text classification. However, due to the varying amounts of language resources, these models exhibit performance imbalance across different languages, a phenomenon known as language imbalance. Existing research on mitigating language imbalance primarily harnesses text and image data, neglecting the auditory aspects of languages. This neglect results in an incomplete solution to language imbalance, as it fails to exploit the rich linguistic nuances conveyed through speech. To address these issues, this paper introduces a novel framework called MultiLingual Contrastive Learning (MLCL) to reduce language imbalance. By incorporating concepts from comparative linguistics into neural networks, we utilize the phonetic similarities among languages within the Sino-Tibetan family to tackle the problem of language imbalance in multilingual pre-trained models. To evaluate our method’s effectiveness, we conducted tests using two synthetic datasets derived from the Flores200 and mms datasets across various models. The experimental results show that, in terms of language imbalance metrics, our model surpasses all baseline models.} }
Endnote
%0 Conference Paper %T MLCL: A Framework for Reducing Language Imbalance in Sino-Tibetan Languages through Adapter Structures %A JiaJun Fang %A Wentao Huang %A Aimin Yang %A Dong Zhou %A Nankai Lin %B Proceedings of the 16th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Vu Nguyen %E Hsuan-Tien Lin %F pmlr-v260-fang25a %I PMLR %P 431--446 %U https://proceedings.mlr.press/v260/fang25a.html %V 260 %X Multilingual pre-trained models have been widely applied in natural language processing (NLP) tasks, including text classification. However, due to the varying amounts of language resources, these models exhibit performance imbalance across different languages, a phenomenon known as language imbalance. Existing research on mitigating language imbalance primarily harnesses text and image data, neglecting the auditory aspects of languages. This neglect results in an incomplete solution to language imbalance, as it fails to exploit the rich linguistic nuances conveyed through speech. To address these issues, this paper introduces a novel framework called MultiLingual Contrastive Learning (MLCL) to reduce language imbalance. By incorporating concepts from comparative linguistics into neural networks, we utilize the phonetic similarities among languages within the Sino-Tibetan family to tackle the problem of language imbalance in multilingual pre-trained models. To evaluate our method’s effectiveness, we conducted tests using two synthetic datasets derived from the Flores200 and mms datasets across various models. The experimental results show that, in terms of language imbalance metrics, our model surpasses all baseline models.
APA
Fang, J., Huang, W., Yang, A., Zhou, D. & Lin, N.. (2025). MLCL: A Framework for Reducing Language Imbalance in Sino-Tibetan Languages through Adapter Structures. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:431-446 Available from https://proceedings.mlr.press/v260/fang25a.html.

Related Material