[edit]
A Fully Neural Tunisian Arabic TTS System
DLI 2025 Research Track, PMLR 302:1-10, 2026.
Abstract
The discipline of Text-To-Speech (TTS) focuses on the artificial generation of spoken language from text, a technology increasingly vital for voice-based applications. Recognizing the rising demand for realistic computer-generated speech, especially for under-represented accents and dialects, this research is driven by the goal of constructing a high-quality Tunisian Arabic TTS system. Highlighting the underdeveloped state of advanced natural language processing technologies like TTS in Tunisia, this paper introduces our work on recording a dedicated Tunisian female Arabic speech dataset. Furthermore, we present an end-to-end deep learning TTS system built upon a deep neural network architecture. A subjective evaluation using Mean Opinion Score (MOS) was conducted, comparing our approach to end-to-end generative and concatenative models. The results of this evaluation indicate that our proposed system outperforms both baselines in terms of both naturalness and intelligibility. Keywords: Text-To-Speech, Dialects, Deep Neural Networks, Tacotron 2, WaveRNN, Griffin-Lim Algorithm.