A Fully Neural Tunisian Arabic TTS System

Moez Ben HajHmida, Hatem Haddad, Aymen Ben El Haj Mabrouk
DLI 2025 Research Track, PMLR 302:1-10, 2026.

Abstract

The discipline of Text-To-Speech (TTS) focuses on the artificial generation of spoken language from text, a technology increasingly vital for voice-based applications. Recognizing the rising demand for realistic computer-generated speech, especially for under-represented accents and dialects, this research is driven by the goal of constructing a high-quality Tunisian Arabic TTS system. Highlighting the underdeveloped state of advanced natural language processing technologies like TTS in Tunisia, this paper introduces our work on recording a dedicated Tunisian female Arabic speech dataset. Furthermore, we present an end-to-end deep learning TTS system built upon a deep neural network architecture. A subjective evaluation using Mean Opinion Score (MOS) was conducted, comparing our approach to end-to-end generative and concatenative models. The results of this evaluation indicate that our proposed system outperforms both baselines in terms of both naturalness and intelligibility. Keywords: Text-To-Speech, Dialects, Deep Neural Networks, Tacotron 2, WaveRNN, Griffin-Lim Algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v302-ben-hajhmida26a, title = {A Fully Neural Tunisian Arabic TTS System}, author = {Ben HajHmida, Moez and Haddad, Hatem and Ben El Haj Mabrouk, Aymen}, booktitle = {DLI 2025 Research Track}, pages = {1--10}, year = {2026}, editor = {Haddad, Hatem and Kahira, Albert Njoroge and Bourhim, Sofia and Olatunji, Iyiola Emmanuel and Makhafola, Lesego and Mwase, Christine}, volume = {302}, series = {Proceedings of Machine Learning Research}, month = {17--22 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v302/main/assets/ben-hajhmida26a/ben-hajhmida26a.pdf}, url = {https://proceedings.mlr.press/v302/ben-hajhmida26a.html}, abstract = {The discipline of Text-To-Speech (TTS) focuses on the artificial generation of spoken language from text, a technology increasingly vital for voice-based applications. Recognizing the rising demand for realistic computer-generated speech, especially for under-represented accents and dialects, this research is driven by the goal of constructing a high-quality Tunisian Arabic TTS system. Highlighting the underdeveloped state of advanced natural language processing technologies like TTS in Tunisia, this paper introduces our work on recording a dedicated Tunisian female Arabic speech dataset. Furthermore, we present an end-to-end deep learning TTS system built upon a deep neural network architecture. A subjective evaluation using Mean Opinion Score (MOS) was conducted, comparing our approach to end-to-end generative and concatenative models. The results of this evaluation indicate that our proposed system outperforms both baselines in terms of both naturalness and intelligibility. Keywords: Text-To-Speech, Dialects, Deep Neural Networks, Tacotron 2, WaveRNN, Griffin-Lim Algorithm.} }
Endnote
%0 Conference Paper %T A Fully Neural Tunisian Arabic TTS System %A Moez Ben HajHmida %A Hatem Haddad %A Aymen Ben El Haj Mabrouk %B DLI 2025 Research Track %C Proceedings of Machine Learning Research %D 2026 %E Hatem Haddad %E Albert Njoroge Kahira %E Sofia Bourhim %E Iyiola Emmanuel Olatunji %E Lesego Makhafola %E Christine Mwase %F pmlr-v302-ben-hajhmida26a %I PMLR %P 1--10 %U https://proceedings.mlr.press/v302/ben-hajhmida26a.html %V 302 %X The discipline of Text-To-Speech (TTS) focuses on the artificial generation of spoken language from text, a technology increasingly vital for voice-based applications. Recognizing the rising demand for realistic computer-generated speech, especially for under-represented accents and dialects, this research is driven by the goal of constructing a high-quality Tunisian Arabic TTS system. Highlighting the underdeveloped state of advanced natural language processing technologies like TTS in Tunisia, this paper introduces our work on recording a dedicated Tunisian female Arabic speech dataset. Furthermore, we present an end-to-end deep learning TTS system built upon a deep neural network architecture. A subjective evaluation using Mean Opinion Score (MOS) was conducted, comparing our approach to end-to-end generative and concatenative models. The results of this evaluation indicate that our proposed system outperforms both baselines in terms of both naturalness and intelligibility. Keywords: Text-To-Speech, Dialects, Deep Neural Networks, Tacotron 2, WaveRNN, Griffin-Lim Algorithm.
APA
Ben HajHmida, M., Haddad, H. & Ben El Haj Mabrouk, A.. (2026). A Fully Neural Tunisian Arabic TTS System. DLI 2025 Research Track, in Proceedings of Machine Learning Research 302:1-10 Available from https://proceedings.mlr.press/v302/ben-hajhmida26a.html.

Related Material