A Fully Neural Tunisian Arabic TTS System

Moez Ben HajHmida; Hatem Haddad; Aymen Ben El Haj Mabrouk

A Fully Neural Tunisian Arabic TTS System

Moez Ben HajHmida, Hatem Haddad, Aymen Ben El Haj Mabrouk

DLI 2025 Research Track, PMLR 302:1-10, 2026.

Abstract

The discipline of Text-To-Speech (TTS) focuses on the artificial generation of spoken language from text, a technology increasingly vital for voice-based applications. Recognizing the rising demand for realistic computer-generated speech, especially for under-represented accents and dialects, this research is driven by the goal of constructing a high-quality Tunisian Arabic TTS system. Highlighting the underdeveloped state of advanced natural language processing technologies like TTS in Tunisia, this paper introduces our work on recording a dedicated Tunisian female Arabic speech dataset. Furthermore, we present an end-to-end deep learning TTS system built upon a deep neural network architecture. A subjective evaluation using Mean Opinion Score (MOS) was conducted, comparing our approach to end-to-end generative and concatenative models. The results of this evaluation indicate that our proposed system outperforms both baselines in terms of both naturalness and intelligibility. Keywords: Text-To-Speech, Dialects, Deep Neural Networks, Tacotron 2, WaveRNN, Griffin-Lim Algorithm.

Cite this Paper

BibTeX

@InProceedings{pmlr-v302-ben-hajhmida26a,
  title = 	 {A Fully Neural Tunisian Arabic TTS System},
  author =       {Ben HajHmida, Moez and Haddad, Hatem and Ben El Haj Mabrouk, Aymen},
  booktitle = 	 {DLI 2025 Research Track},
  pages = 	 {1--10},
  year = 	 {2026},
  editor = 	 {Haddad, Hatem and Kahira, Albert Njoroge and Bourhim, Sofia and Olatunji, Iyiola Emmanuel and Makhafola, Lesego and Mwase, Christine},
  volume = 	 {302},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--22 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v302/main/assets/ben-hajhmida26a/ben-hajhmida26a.pdf},
  url = 	 {https://proceedings.mlr.press/v302/ben-hajhmida26a.html},
  abstract = 	 {The discipline of Text-To-Speech (TTS) focuses on the artificial generation of spoken language from text, a technology increasingly vital for voice-based applications. Recognizing the rising demand for realistic computer-generated speech, especially for under-represented accents and dialects, this research is driven by the goal of constructing a high-quality Tunisian Arabic TTS system. Highlighting the underdeveloped state of advanced natural language processing technologies like TTS in Tunisia, this paper introduces our work on recording a dedicated Tunisian female Arabic speech dataset. Furthermore, we present an end-to-end deep learning TTS system built upon a deep neural network architecture. A subjective evaluation using Mean Opinion Score (MOS) was conducted, comparing our approach to end-to-end generative and concatenative models. The results of this evaluation indicate that our proposed system outperforms both baselines in terms of both naturalness and intelligibility. Keywords: Text-To-Speech, Dialects, Deep Neural Networks, Tacotron 2, WaveRNN, Griffin-Lim Algorithm.}
}

Endnote

%0 Conference Paper
%T A Fully Neural Tunisian Arabic TTS System
%A Moez Ben HajHmida
%A Hatem Haddad
%A Aymen Ben El Haj Mabrouk
%B DLI 2025 Research Track
%C Proceedings of Machine Learning Research
%D 2026
%E Hatem Haddad
%E Albert Njoroge Kahira
%E Sofia Bourhim
%E Iyiola Emmanuel Olatunji
%E Lesego Makhafola
%E Christine Mwase	
%F pmlr-v302-ben-hajhmida26a
%I PMLR
%P 1--10
%U https://proceedings.mlr.press/v302/ben-hajhmida26a.html
%V 302
%X The discipline of Text-To-Speech (TTS) focuses on the artificial generation of spoken language from text, a technology increasingly vital for voice-based applications. Recognizing the rising demand for realistic computer-generated speech, especially for under-represented accents and dialects, this research is driven by the goal of constructing a high-quality Tunisian Arabic TTS system. Highlighting the underdeveloped state of advanced natural language processing technologies like TTS in Tunisia, this paper introduces our work on recording a dedicated Tunisian female Arabic speech dataset. Furthermore, we present an end-to-end deep learning TTS system built upon a deep neural network architecture. A subjective evaluation using Mean Opinion Score (MOS) was conducted, comparing our approach to end-to-end generative and concatenative models. The results of this evaluation indicate that our proposed system outperforms both baselines in terms of both naturalness and intelligibility. Keywords: Text-To-Speech, Dialects, Deep Neural Networks, Tacotron 2, WaveRNN, Griffin-Lim Algorithm.

APA

Ben HajHmida, M., Haddad, H. & Ben El Haj Mabrouk, A.. (2026). A Fully Neural Tunisian Arabic TTS System. DLI 2025 Research Track, in Proceedings of Machine Learning Research 302:1-10 Available from https://proceedings.mlr.press/v302/ben-hajhmida26a.html.

Related Material

Download PDF