[edit]
How Much Speech Data is Necessary for ASR in African Languages? An Evaluation of Data Scaling in Kinyarwanda and Kikuyu
Proceedings of the AI for African Languages Conference 2025, PMLR 314:41-49, 2026.
Abstract
Automatic speech recognition for low-resource African languages is limited by scarce transcribed data. This paper evaluates data requirements using the Whisper model through systematic scaling experiments on Kinyarwanda from 1 to 1,400 hours and detailed error analysis on Kikuyu with 270 hours of data. Results show that usable ASR performance with word error rate below 13 percent is achievable with approximately 50 hours of data, with continued gains up to 200 hours. Error analysis reveals that transcription noise accounts for 38.6 percent of high-error cases, highlighting the importance of data quality alongside data volume.