[edit]
Ẹhugbo Ka! Advancing Machine Translation for the Low-Resource Ẹhugbo Language through Parallel Corpus Development
DLI 2025 Research Track, PMLR 302:1-9, 2026.
Abstract
Despite advancements in language technologies, there consistently seems to be an exclusion of low-resource African languages and their dialects like Ẹhugbo, a critically endangered variant of Igbo spoken by fewer than 150,000 people in Afikpo, Nigeria. This exclusion perpetuates social and linguistic inequities, leaving speakers of such dialects without access to digital tools that could preserve their language and culture. This paper presents Ẹhugbo Ka! (”Greetings Ẹhugbo!”) addresses this gap. We gathered and built the only publicly available parallel corpus, 1,021 Ẹhugbo-English sentences from the New Testament of the Bible, we evaluated and fine-tuned two state-of-the-art models, M2M100 (facebook/m2m100 418M) and NLLB (facebook/nllb-200-distilled-600M). Initial results were stark: M2M100 achieved a BLEU score of 1.2188, while NLLB scored only 0.0262. After fine-tuning, M2M100 improved to 16.1719, and NLLB achieved 20.4016, demonstrating the potential of adapting LLMs for low-resource languages. Our findings reveal both promise and challenges. While fine-tuning significantly improves performance, the lack of diverse datasets limits translation quality and reinforces the need for inclusive data collection practices. This work highlights the importance of community-driven approaches, as linguistic preservation cannot be achieved without the active involvement of native speakers.This project not only advances the field of low resource MT but also serves as a call to action for researchers and developers to prioritize linguistic diversity, ensuring that no language is left behind in the digital age. Keywords: multilingual low resource, resources for less-resourced languages, minoritized languages, less resourced languages, endangered languages, indigenous languages, corpus creation, multilingual corpora, evaluation, datasets for low resource languages, Igbo, Igbo language.