Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models

Rashid Kisejjere; Reuben Magala; Abubakhari Sserwadda

Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models

Rashid Kisejjere, Reuben Magala, Abubakhari Sserwadda

Proceedings of the AI for African Languages Conference 2025, PMLR 314:27-32, 2026.

Abstract

Despite recent advances in large language models, most of Uganda’s over 40 indigenous languages remain underrepresented in natural language processing systems. This work introduces ugGPT, an open instruction-tuned model designed specifically for Luganda, spoken by over 20 million people. We also release a 200 million token monolingual Luganda corpus and a culturally contextualized instruction dataset with over 70,000 examples. In addition, we present accent-aware text-to-speech models for English, Luganda, Runyankole, Acholi, and Iteso, fine-tuned from the Orpheus 3B architecture. Experimental results show that ugGPT outperforms multilingual baselines and that the speech models generate intelligible and natural audio for low-resource languages.

Cite this Paper

BibTeX

@InProceedings{pmlr-v314-kisejjere26a,
  title = 	 {Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models},
  author =       {Kisejjere, Rashid and Magala, Reuben and Sserwadda, Abubakhari},
  booktitle = 	 {Proceedings of the AI for African Languages Conference 2025},
  pages = 	 {27--32},
  year = 	 {2026},
  editor = 	 {Bainomugisha, Engineer and Mwebaze, Ernest and Kimera, Richard and Nabende, Joyce Nakatumba and Katumba, Andrew and Quinn, John},
  volume = 	 {314},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10 Oct},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v314/main/assets/kisejjere26a/kisejjere26a.pdf},
  url = 	 {https://proceedings.mlr.press/v314/kisejjere26a.html},
  abstract = 	 {Despite recent advances in large language models, most of Uganda’s over 40 indigenous languages remain underrepresented in natural language processing systems. This work introduces ugGPT, an open instruction-tuned model designed specifically for Luganda, spoken by over 20 million people. We also release a 200 million token monolingual Luganda corpus and a culturally contextualized instruction dataset with over 70,000 examples. In addition, we present accent-aware text-to-speech models for English, Luganda, Runyankole, Acholi, and Iteso, fine-tuned from the Orpheus 3B architecture. Experimental results show that ugGPT outperforms multilingual baselines and that the speech models generate intelligible and natural audio for low-resource languages.}
}

Endnote

%0 Conference Paper
%T Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models
%A Rashid Kisejjere
%A Reuben Magala
%A Abubakhari Sserwadda
%B Proceedings of the AI for African Languages Conference 2025
%C Proceedings of Machine Learning Research
%D 2026
%E Engineer Bainomugisha
%E Ernest Mwebaze
%E Richard Kimera
%E Joyce Nakatumba Nabende
%E Andrew Katumba
%E John Quinn	
%F pmlr-v314-kisejjere26a
%I PMLR
%P 27--32
%U https://proceedings.mlr.press/v314/kisejjere26a.html
%V 314
%X Despite recent advances in large language models, most of Uganda’s over 40 indigenous languages remain underrepresented in natural language processing systems. This work introduces ugGPT, an open instruction-tuned model designed specifically for Luganda, spoken by over 20 million people. We also release a 200 million token monolingual Luganda corpus and a culturally contextualized instruction dataset with over 70,000 examples. In addition, we present accent-aware text-to-speech models for English, Luganda, Runyankole, Acholi, and Iteso, fine-tuned from the Orpheus 3B architecture. Experimental results show that ugGPT outperforms multilingual baselines and that the speech models generate intelligible and natural audio for low-resource languages.

APA

Kisejjere, R., Magala, R. & Sserwadda, A.. (2026). Promoting Uganda’s Major Local Languages: Introducing Luganda Text Generation Models and Diverse Accent-Aware TTS Models. Proceedings of the AI for African Languages Conference 2025, in Proceedings of Machine Learning Research 314:27-32 Available from https://proceedings.mlr.press/v314/kisejjere26a.html.

Related Material

Download PDF