[edit]
Sunflower: A New Approach to Expanding Coverage of African Languages in Large Language Models
Proceedings of the AI for African Languages Conference 2025, PMLR 314:1-20, 2026.
Abstract
There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading large language models exhibit strong performance on a number of widely spoken languages such as Swahili or Yoruba, but prioritize support for languages with the largest speaker populations, resulting in uneven coverage. We argue that a regionally focused approach is more efficient and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, two models based on Qwen 3 that achieve state-of-the-art comprehension across the majority of Ugandan languages. These open-source models can help reduce language barriers in a range of practical applications.