Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models

Rashid Kisejjere, Reuben Magala, Abubakhari Sserwadda
Proceedings of the AI for African Languages Conference 2025, PMLR 314:27-32, 2026.

Abstract

Despite recent advances in large language models, most of Uganda’s over 40 indigenous languages remain underrepresented in natural language processing systems. This work introduces ugGPT, an open instruction-tuned model designed specifically for Luganda, spoken by over 20 million people. We also release a 200 million token monolingual Luganda corpus and a culturally contextualized instruction dataset with over 70,000 examples. In addition, we present accent-aware text-to-speech models for English, Luganda, Runyankole, Acholi, and Iteso, fine-tuned from the Orpheus 3B architecture. Experimental results show that ugGPT outperforms multilingual baselines and that the speech models generate intelligible and natural audio for low-resource languages.

Cite this Paper


BibTeX
@InProceedings{pmlr-v314-kisejjere26a, title = {Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models}, author = {Kisejjere, Rashid and Magala, Reuben and Sserwadda, Abubakhari}, booktitle = {Proceedings of the AI for African Languages Conference 2025}, pages = {27--32}, year = {2026}, editor = {Bainomugisha, Engineer and Mwebaze, Ernest and Kimera, Richard and Nabende, Joyce Nakatumba and Katumba, Andrew and Quinn, John}, volume = {314}, series = {Proceedings of Machine Learning Research}, month = {10 Oct}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v314/main/assets/kisejjere26a/kisejjere26a.pdf}, url = {https://proceedings.mlr.press/v314/kisejjere26a.html}, abstract = {Despite recent advances in large language models, most of Uganda’s over 40 indigenous languages remain underrepresented in natural language processing systems. This work introduces ugGPT, an open instruction-tuned model designed specifically for Luganda, spoken by over 20 million people. We also release a 200 million token monolingual Luganda corpus and a culturally contextualized instruction dataset with over 70,000 examples. In addition, we present accent-aware text-to-speech models for English, Luganda, Runyankole, Acholi, and Iteso, fine-tuned from the Orpheus 3B architecture. Experimental results show that ugGPT outperforms multilingual baselines and that the speech models generate intelligible and natural audio for low-resource languages.} }
Endnote
%0 Conference Paper %T Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models %A Rashid Kisejjere %A Reuben Magala %A Abubakhari Sserwadda %B Proceedings of the AI for African Languages Conference 2025 %C Proceedings of Machine Learning Research %D 2026 %E Engineer Bainomugisha %E Ernest Mwebaze %E Richard Kimera %E Joyce Nakatumba Nabende %E Andrew Katumba %E John Quinn %F pmlr-v314-kisejjere26a %I PMLR %P 27--32 %U https://proceedings.mlr.press/v314/kisejjere26a.html %V 314 %X Despite recent advances in large language models, most of Uganda’s over 40 indigenous languages remain underrepresented in natural language processing systems. This work introduces ugGPT, an open instruction-tuned model designed specifically for Luganda, spoken by over 20 million people. We also release a 200 million token monolingual Luganda corpus and a culturally contextualized instruction dataset with over 70,000 examples. In addition, we present accent-aware text-to-speech models for English, Luganda, Runyankole, Acholi, and Iteso, fine-tuned from the Orpheus 3B architecture. Experimental results show that ugGPT outperforms multilingual baselines and that the speech models generate intelligible and natural audio for low-resource languages.
APA
Kisejjere, R., Magala, R. & Sserwadda, A.. (2026). Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models. Proceedings of the AI for African Languages Conference 2025, in Proceedings of Machine Learning Research 314:27-32 Available from https://proceedings.mlr.press/v314/kisejjere26a.html.

Related Material