[edit]
MLCL: A Framework for Reducing Language Imbalance in Sino-Tibetan Languages through Adapter Structures
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:431-446, 2025.
Abstract
Multilingual pre-trained models have been widely applied in natural language processing (NLP) tasks, including text classification. However, due to the varying amounts of language resources, these models exhibit performance imbalance across different languages, a phenomenon known as language imbalance. Existing research on mitigating language imbalance primarily harnesses text and image data, neglecting the auditory aspects of languages. This neglect results in an incomplete solution to language imbalance, as it fails to exploit the rich linguistic nuances conveyed through speech. To address these issues, this paper introduces a novel framework called MultiLingual Contrastive Learning (MLCL) to reduce language imbalance. By incorporating concepts from comparative linguistics into neural networks, we utilize the phonetic similarities among languages within the Sino-Tibetan family to tackle the problem of language imbalance in multilingual pre-trained models. To evaluate our method’s effectiveness, we conducted tests using two synthetic datasets derived from the Flores200 and mms datasets across various models. The experimental results show that, in terms of language imbalance metrics, our model surpasses all baseline models.