[edit]
Closing the Gap in Low-Resource ASR: Leveraging Multilingual Models for Code-Switched Yoruba-English Speech
DLI 2025 Research Track, PMLR 302:1-9, 2026.
Abstract
Recent advancements in Automatic Speech Recognition (ASR) have revolutionized voice-based technologies, yet challenges persist in achieving accurate recognition for multilingual and low-resource languages. This research explores the performance of state-of-the-art multilingual ASR models (Whisper Large v3 and MMS-1B-All) on Yoruba-English code-switched (CS) speech. Despite notable progress in multilingual ASR, code-switching remains a complex challenge due to the linguistic intricacies introduced by phonetic, syntactic, and lexical shifts within single utterances. This study addresses a significant gap in the literature by evaluating these models on a 21-hour Yoruba-English dataset and finetuned for domain-specific performance. Results show that fine-tuning led to substantial improvements in Word Error Rate (WER), with MMS-1B-All achieving a 55.8% reduction and Whisper Large v3 showing a 50.1% reduction. Although MMS-1B-All outperformed Whisper Large v3 slightly, both models demonstrated strong potential for ASR in Yoruba-English CS speech recognition. This study highlights the feasibility of fine-tuning multilingual ASR models for low-resource code-switched scenarios and suggests directions for future research, including dataset expansion, alternative fine-tuning strategies, and real-time performance evaluation. Keywords: automatic speech recognition, code-switching, multilingual ASR, low-resource languages