[edit]
Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models
Proceedings of the AI for African Languages Conference 2025, PMLR 314:27-32, 2026.
Abstract
Despite recent advances in large language models, most of Uganda’s over 40 indigenous languages remain underrepresented in natural language processing systems. This work introduces ugGPT, an open instruction-tuned model designed specifically for Luganda, spoken by over 20 million people. We also release a 200 million token monolingual Luganda corpus and a culturally contextualized instruction dataset with over 70,000 examples. In addition, we present accent-aware text-to-speech models for English, Luganda, Runyankole, Acholi, and Iteso, fine-tuned from the Orpheus 3B architecture. Experimental results show that ugGPT outperforms multilingual baselines and that the speech models generate intelligible and natural audio for low-resource languages.