SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS

Ayush Pratap Singh, Harshit Singh, Nityanand Mathur, Akshat Mandloi, Sudarshan Kamath
Conference on Parsimony and Learning, PMLR 328:1090-1100, 2026.

Abstract

Neural text-to-speech systems systematically mispronounce low-resource proper nouns, particularly non-English names, brands, and geographic locations due to their underrepresentation in predominantly English training corpora. Existing solutions require expensive multilingual data collection or manual phonetic annotation, limiting TTS deployment in diverse linguistic contexts. We introduce SonoEdit, a model editing technique that surgically corrects pronunciation errors in pre-trained TTS models without retraining. Correcting such errors traditionally requires costly supervised finetuning or manual phoneme injection. In this work, we present a parsimonious alternative using Null-Space Pronunciation Editing, a single-shot parameter update that modifies the pronunciation of specific words while provably preserving the rest of the model’s behavior. We first adapt Acoustic Causal Tracing to identify the specific Transformer layers governing text-to-pronunciation mapping. We then employ Null-Space Constrained Editing to compute a closed-form weight update that rectifies the target pronunciation while remaining mathematically orthogonal to the manifold of general speech, constructing a constrained update that drives the model’s acoustic output toward a desired pronunciation exemplar while ensuring zero first-order change on a preserved corpus.

Cite this Paper


BibTeX
@InProceedings{pmlr-v328-singh26a, title = {SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS}, author = {Singh, Ayush Pratap and Singh, Harshit and Mathur, Nityanand and Mandloi, Akshat and Kamath, Sudarshan}, booktitle = {Conference on Parsimony and Learning}, pages = {1090--1100}, year = {2026}, editor = {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui}, volume = {328}, series = {Proceedings of Machine Learning Research}, month = {23--26 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v328/main/assets/singh26a/singh26a.pdf}, url = {https://proceedings.mlr.press/v328/singh26a.html}, abstract = {Neural text-to-speech systems systematically mispronounce low-resource proper nouns, particularly non-English names, brands, and geographic locations due to their underrepresentation in predominantly English training corpora. Existing solutions require expensive multilingual data collection or manual phonetic annotation, limiting TTS deployment in diverse linguistic contexts. We introduce SonoEdit, a model editing technique that surgically corrects pronunciation errors in pre-trained TTS models without retraining. Correcting such errors traditionally requires costly supervised finetuning or manual phoneme injection. In this work, we present a parsimonious alternative using Null-Space Pronunciation Editing, a single-shot parameter update that modifies the pronunciation of specific words while provably preserving the rest of the model’s behavior. We first adapt Acoustic Causal Tracing to identify the specific Transformer layers governing text-to-pronunciation mapping. We then employ Null-Space Constrained Editing to compute a closed-form weight update that rectifies the target pronunciation while remaining mathematically orthogonal to the manifold of general speech, constructing a constrained update that drives the model’s acoustic output toward a desired pronunciation exemplar while ensuring zero first-order change on a preserved corpus.} }
Endnote
%0 Conference Paper %T SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS %A Ayush Pratap Singh %A Harshit Singh %A Nityanand Mathur %A Akshat Mandloi %A Sudarshan Kamath %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2026 %E Rebekka Burkholz %E Shiwei Liu %E Saiprasad Ravishankar %E William Redman %E Wei Huang %E Weijie Su %E Zhihui Zhu %F pmlr-v328-singh26a %I PMLR %P 1090--1100 %U https://proceedings.mlr.press/v328/singh26a.html %V 328 %X Neural text-to-speech systems systematically mispronounce low-resource proper nouns, particularly non-English names, brands, and geographic locations due to their underrepresentation in predominantly English training corpora. Existing solutions require expensive multilingual data collection or manual phonetic annotation, limiting TTS deployment in diverse linguistic contexts. We introduce SonoEdit, a model editing technique that surgically corrects pronunciation errors in pre-trained TTS models without retraining. Correcting such errors traditionally requires costly supervised finetuning or manual phoneme injection. In this work, we present a parsimonious alternative using Null-Space Pronunciation Editing, a single-shot parameter update that modifies the pronunciation of specific words while provably preserving the rest of the model’s behavior. We first adapt Acoustic Causal Tracing to identify the specific Transformer layers governing text-to-pronunciation mapping. We then employ Null-Space Constrained Editing to compute a closed-form weight update that rectifies the target pronunciation while remaining mathematically orthogonal to the manifold of general speech, constructing a constrained update that drives the model’s acoustic output toward a desired pronunciation exemplar while ensuring zero first-order change on a preserved corpus.
APA
Singh, A.P., Singh, H., Mathur, N., Mandloi, A. & Kamath, S.. (2026). SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:1090-1100 Available from https://proceedings.mlr.press/v328/singh26a.html.

Related Material