Sunflower: A New Approach to Expanding Coverage of African Languages in Large Language Models

Benjamin Akera, Evelyn Nafula Ouma, Gilbert Yiga, Patrick Walukagga, Phionah Natukunda, Trevor Saaka, Solomon Nsumba, Lilian Teddy Nabukeera, Joel Tibabwetiza Muhanguzi, Imran Sekalala, Nimpamya Janat Namara, Engineer Bainomugisha, Ernest Mwebaze, John Quinn
Proceedings of the AI for African Languages Conference 2025, PMLR 314:1-20, 2026.

Abstract

There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading large language models exhibit strong performance on a number of widely spoken languages such as Swahili or Yoruba, but prioritize support for languages with the largest speaker populations, resulting in uneven coverage. We argue that a regionally focused approach is more efficient and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, two models based on Qwen 3 that achieve state-of-the-art comprehension across the majority of Ugandan languages. These open-source models can help reduce language barriers in a range of practical applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v314-akera26a, title = {Sunflower: A New Approach to Expanding Coverage of African Languages in Large Language Models}, author = {Akera, Benjamin and Ouma, Evelyn Nafula and Yiga, Gilbert and Walukagga, Patrick and Natukunda, Phionah and Saaka, Trevor and Nsumba, Solomon and Nabukeera, Lilian Teddy and Muhanguzi, Joel Tibabwetiza and Sekalala, Imran and Namara, Nimpamya Janat and Bainomugisha, Engineer and Mwebaze, Ernest and Quinn, John}, booktitle = {Proceedings of the AI for African Languages Conference 2025}, pages = {1--20}, year = {2026}, editor = {Bainomugisha, Engineer and Mwebaze, Ernest and Kimera, Richard and Nabende, Joyce Nakatumba and Katumba, Andrew and Quinn, John}, volume = {314}, series = {Proceedings of Machine Learning Research}, month = {10 Oct}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v314/main/assets/akera26a/akera26a.pdf}, url = {https://proceedings.mlr.press/v314/akera26a.html}, abstract = {There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading large language models exhibit strong performance on a number of widely spoken languages such as Swahili or Yoruba, but prioritize support for languages with the largest speaker populations, resulting in uneven coverage. We argue that a regionally focused approach is more efficient and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, two models based on Qwen 3 that achieve state-of-the-art comprehension across the majority of Ugandan languages. These open-source models can help reduce language barriers in a range of practical applications.} }
Endnote
%0 Conference Paper %T Sunflower: A New Approach to Expanding Coverage of African Languages in Large Language Models %A Benjamin Akera %A Evelyn Nafula Ouma %A Gilbert Yiga %A Patrick Walukagga %A Phionah Natukunda %A Trevor Saaka %A Solomon Nsumba %A Lilian Teddy Nabukeera %A Joel Tibabwetiza Muhanguzi %A Imran Sekalala %A Nimpamya Janat Namara %A Engineer Bainomugisha %A Ernest Mwebaze %A John Quinn %B Proceedings of the AI for African Languages Conference 2025 %C Proceedings of Machine Learning Research %D 2026 %E Engineer Bainomugisha %E Ernest Mwebaze %E Richard Kimera %E Joyce Nakatumba Nabende %E Andrew Katumba %E John Quinn %F pmlr-v314-akera26a %I PMLR %P 1--20 %U https://proceedings.mlr.press/v314/akera26a.html %V 314 %X There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading large language models exhibit strong performance on a number of widely spoken languages such as Swahili or Yoruba, but prioritize support for languages with the largest speaker populations, resulting in uneven coverage. We argue that a regionally focused approach is more efficient and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, two models based on Qwen 3 that achieve state-of-the-art comprehension across the majority of Ugandan languages. These open-source models can help reduce language barriers in a range of practical applications.
APA
Akera, B., Ouma, E.N., Yiga, G., Walukagga, P., Natukunda, P., Saaka, T., Nsumba, S., Nabukeera, L.T., Muhanguzi, J.T., Sekalala, I., Namara, N.J., Bainomugisha, E., Mwebaze, E. & Quinn, J.. (2026). Sunflower: A New Approach to Expanding Coverage of African Languages in Large Language Models. Proceedings of the AI for African Languages Conference 2025, in Proceedings of Machine Learning Research 314:1-20 Available from https://proceedings.mlr.press/v314/akera26a.html.

Related Material