How Much Speech Data is Necessary for ASR in African Languages? An Evaluation of Data Scaling in Kinyarwanda and Kikuyu

Benjamin Akera, Patrick Walukagga, Gilbert Yiga, John Quinn, Ernest Mwebaze
Proceedings of the AI for African Languages Conference 2025, PMLR 314:41-49, 2026.

Abstract

Automatic speech recognition for low-resource African languages is limited by scarce transcribed data. This paper evaluates data requirements using the Whisper model through systematic scaling experiments on Kinyarwanda from 1 to 1,400 hours and detailed error analysis on Kikuyu with 270 hours of data. Results show that usable ASR performance with word error rate below 13 percent is achievable with approximately 50 hours of data, with continued gains up to 200 hours. Error analysis reveals that transcription noise accounts for 38.6 percent of high-error cases, highlighting the importance of data quality alongside data volume.

Cite this Paper


BibTeX
@InProceedings{pmlr-v314-akera26b, title = {How Much Speech Data is Necessary for ASR in African Languages? An Evaluation of Data Scaling in Kinyarwanda and Kikuyu}, author = {Akera, Benjamin and Walukagga, Patrick and Yiga, Gilbert and Quinn, John and Mwebaze, Ernest}, booktitle = {Proceedings of the AI for African Languages Conference 2025}, pages = {41--49}, year = {2026}, editor = {Bainomugisha, Engineer and Mwebaze, Ernest and Kimera, Richard and Nabende, Joyce Nakatumba and Katumba, Andrew and Quinn, John}, volume = {314}, series = {Proceedings of Machine Learning Research}, month = {10 Oct}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v314/main/assets/akera26b/akera26b.pdf}, url = {https://proceedings.mlr.press/v314/akera26b.html}, abstract = {Automatic speech recognition for low-resource African languages is limited by scarce transcribed data. This paper evaluates data requirements using the Whisper model through systematic scaling experiments on Kinyarwanda from 1 to 1,400 hours and detailed error analysis on Kikuyu with 270 hours of data. Results show that usable ASR performance with word error rate below 13 percent is achievable with approximately 50 hours of data, with continued gains up to 200 hours. Error analysis reveals that transcription noise accounts for 38.6 percent of high-error cases, highlighting the importance of data quality alongside data volume.} }
Endnote
%0 Conference Paper %T How Much Speech Data is Necessary for ASR in African Languages? An Evaluation of Data Scaling in Kinyarwanda and Kikuyu %A Benjamin Akera %A Patrick Walukagga %A Gilbert Yiga %A John Quinn %A Ernest Mwebaze %B Proceedings of the AI for African Languages Conference 2025 %C Proceedings of Machine Learning Research %D 2026 %E Engineer Bainomugisha %E Ernest Mwebaze %E Richard Kimera %E Joyce Nakatumba Nabende %E Andrew Katumba %E John Quinn %F pmlr-v314-akera26b %I PMLR %P 41--49 %U https://proceedings.mlr.press/v314/akera26b.html %V 314 %X Automatic speech recognition for low-resource African languages is limited by scarce transcribed data. This paper evaluates data requirements using the Whisper model through systematic scaling experiments on Kinyarwanda from 1 to 1,400 hours and detailed error analysis on Kikuyu with 270 hours of data. Results show that usable ASR performance with word error rate below 13 percent is achievable with approximately 50 hours of data, with continued gains up to 200 hours. Error analysis reveals that transcription noise accounts for 38.6 percent of high-error cases, highlighting the importance of data quality alongside data volume.
APA
Akera, B., Walukagga, P., Yiga, G., Quinn, J. & Mwebaze, E.. (2026). How Much Speech Data is Necessary for ASR in African Languages? An Evaluation of Data Scaling in Kinyarwanda and Kikuyu. Proceedings of the AI for African Languages Conference 2025, in Proceedings of Machine Learning Research 314:41-49 Available from https://proceedings.mlr.press/v314/akera26b.html.

Related Material