Closing the Gap in Low-Resource ASR: Leveraging Multilingual Models for Code-Switched Yoruba-English Speech

Emmanuel Bolarinwa; Oreoluwa Babatunde; Victor Olufemi; Kausar Moshood; Oluwademilade Williams

Closing the Gap in Low-Resource ASR: Leveraging Multilingual Models for Code-Switched Yoruba-English Speech

Emmanuel Bolarinwa, Oreoluwa Babatunde, Victor Olufemi, Kausar Moshood, Oluwademilade Williams

DLI 2025 Research Track, PMLR 302:1-9, 2026.

Abstract

Recent advancements in Automatic Speech Recognition (ASR) have revolutionized voice-based technologies, yet challenges persist in achieving accurate recognition for multilingual and low-resource languages. This research explores the performance of state-of-the-art multilingual ASR models (Whisper Large v3 and MMS-1B-All) on Yoruba-English code-switched (CS) speech. Despite notable progress in multilingual ASR, code-switching remains a complex challenge due to the linguistic intricacies introduced by phonetic, syntactic, and lexical shifts within single utterances. This study addresses a significant gap in the literature by evaluating these models on a 21-hour Yoruba-English dataset and finetuned for domain-specific performance. Results show that fine-tuning led to substantial improvements in Word Error Rate (WER), with MMS-1B-All achieving a 55.8% reduction and Whisper Large v3 showing a 50.1% reduction. Although MMS-1B-All outperformed Whisper Large v3 slightly, both models demonstrated strong potential for ASR in Yoruba-English CS speech recognition. This study highlights the feasibility of fine-tuning multilingual ASR models for low-resource code-switched scenarios and suggests directions for future research, including dataset expansion, alternative fine-tuning strategies, and real-time performance evaluation. Keywords: automatic speech recognition, code-switching, multilingual ASR, low-resource languages

Cite this Paper

BibTeX

@InProceedings{pmlr-v302-bolarinwa26a,
  title = 	 {Closing the Gap in Low-Resource ASR: Leveraging Multilingual Models for Code-Switched Yoruba-English Speech},
  author =       {Bolarinwa, Emmanuel and Babatunde, Oreoluwa and Olufemi, Victor and Moshood, Kausar and Williams, Oluwademilade},
  booktitle = 	 {DLI 2025 Research Track},
  pages = 	 {1--9},
  year = 	 {2026},
  editor = 	 {Haddad, Hatem and Kahira, Albert Njoroge and Bourhim, Sofia and Olatunji, Iyiola Emmanuel and Makhafola, Lesego and Mwase, Christine},
  volume = 	 {302},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--22 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v302/main/assets/bolarinwa26a/bolarinwa26a.pdf},
  url = 	 {https://proceedings.mlr.press/v302/bolarinwa26a.html},
  abstract = 	 {Recent advancements in Automatic Speech Recognition (ASR) have revolutionized voice-based technologies, yet challenges persist in achieving accurate recognition for multilingual and low-resource languages. This research explores the performance of state-of-the-art multilingual ASR models (Whisper Large v3 and MMS-1B-All) on Yoruba-English code-switched (CS) speech. Despite notable progress in multilingual ASR, code-switching remains a complex challenge due to the linguistic intricacies introduced by phonetic, syntactic, and lexical shifts within single utterances. This study addresses a significant gap in the literature by evaluating these models on a 21-hour Yoruba-English dataset and finetuned for domain-specific performance. Results show that fine-tuning led to substantial improvements in Word Error Rate (WER), with MMS-1B-All achieving a 55.8% reduction and Whisper Large v3 showing a 50.1% reduction. Although MMS-1B-All outperformed Whisper Large v3 slightly, both models demonstrated strong potential for ASR in Yoruba-English CS speech recognition. This study highlights the feasibility of fine-tuning multilingual ASR models for low-resource code-switched scenarios and suggests directions for future research, including dataset expansion, alternative fine-tuning strategies, and real-time performance evaluation. Keywords: automatic speech recognition, code-switching, multilingual ASR, low-resource languages}
}

Endnote

%0 Conference Paper
%T Closing the Gap in Low-Resource ASR: Leveraging Multilingual Models for Code-Switched Yoruba-English Speech
%A Emmanuel Bolarinwa
%A Oreoluwa Babatunde
%A Victor Olufemi
%A Kausar Moshood
%A Oluwademilade Williams
%B DLI 2025 Research Track
%C Proceedings of Machine Learning Research
%D 2026
%E Hatem Haddad
%E Albert Njoroge Kahira
%E Sofia Bourhim
%E Iyiola Emmanuel Olatunji
%E Lesego Makhafola
%E Christine Mwase	
%F pmlr-v302-bolarinwa26a
%I PMLR
%P 1--9
%U https://proceedings.mlr.press/v302/bolarinwa26a.html
%V 302
%X Recent advancements in Automatic Speech Recognition (ASR) have revolutionized voice-based technologies, yet challenges persist in achieving accurate recognition for multilingual and low-resource languages. This research explores the performance of state-of-the-art multilingual ASR models (Whisper Large v3 and MMS-1B-All) on Yoruba-English code-switched (CS) speech. Despite notable progress in multilingual ASR, code-switching remains a complex challenge due to the linguistic intricacies introduced by phonetic, syntactic, and lexical shifts within single utterances. This study addresses a significant gap in the literature by evaluating these models on a 21-hour Yoruba-English dataset and finetuned for domain-specific performance. Results show that fine-tuning led to substantial improvements in Word Error Rate (WER), with MMS-1B-All achieving a 55.8% reduction and Whisper Large v3 showing a 50.1% reduction. Although MMS-1B-All outperformed Whisper Large v3 slightly, both models demonstrated strong potential for ASR in Yoruba-English CS speech recognition. This study highlights the feasibility of fine-tuning multilingual ASR models for low-resource code-switched scenarios and suggests directions for future research, including dataset expansion, alternative fine-tuning strategies, and real-time performance evaluation. Keywords: automatic speech recognition, code-switching, multilingual ASR, low-resource languages

APA

Bolarinwa, E., Babatunde, O., Olufemi, V., Moshood, K. & Williams, O.. (2026). Closing the Gap in Low-Resource ASR: Leveraging Multilingual Models for Code-Switched Yoruba-English Speech. DLI 2025 Research Track, in Proceedings of Machine Learning Research 302:1-9 Available from https://proceedings.mlr.press/v302/bolarinwa26a.html.

Related Material

Download PDF