[edit]
Can You Hear Naples? Building and Benchmarking a Neapolitan Speech Corpus
Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:94-112, 2026.
Abstract
This paper presents the creation and analysis of the first spoken corpus for Neapolitan, a richly historic but under-resourced Romance dialect of Southern Italy. Despite its cultural importance, Neapolitan has been largely omitted from computational resources, limiting both dialectological research and the development of equitable speech technologies. We address this gap by creating the first structured spoken resource for Neapolitan, enabling systematic evaluation of dialectal ASR performance. Each clip was manually transcribed in orthographic Neapolitan and automatically aligned using OpenAI’s Whisper API, configured for standard Italian. To figure out how well Whisper transcribed the spoken Neapolitan sentences, we checked the outputs against the correct human-written texts using a few different methods. Specifically, we looked at how often the words matched (BLEU), how different the transcriptions were overall (normalized Levenshtein distance), and how closely the sets of words lined up (Jaccard similarity). We also used Word Error Rate (WER), but to make it easier to interpret, we converted it to similarity by subtracting from one (1–WER). A higher value means the transcription was more accurate. On average, this similarity measure came out very low, around 0.1306 ($\sigma$ = 0.1654), meaning roughly 87 percent of the words were transcribed incorrectly. The other evaluation measures told the same story: normalized Levenshtein similarity averaged around 0.6360, and Jaccard similarity was just 0.1078. This paper makes three crucial steps: (1) developed an easy-to-follow process anyone can use to build similar datasets for other dialects, (2) released the first openly accessible Neapolitan speech corpus, and (3) demonstrated just how critical it is to build ASR systems specifically trained on dialects, supporting not just computational linguistic research but also efforts to preserve these unique languages.