SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications

Jionghao Han, Jiatong Shi, Masao Someki, Yuxun Tang, Lan Liu, Yiwen Zhao, Wenhao Feng, Shinji Watanabe
Proceedings of Machine Learning Research, PMLR 303:1-21, 2026.

Abstract

With recent advances in automatic speech recognition (ASR), large language models (LLMs), and text-to-speech (TTS) technologies, spoken dialogue systems (SDS) have become widely accessible. However, most existing SDS are limited to conventional spoken responses. We present SingingSDS, a cascaded SDS that responds through singing rather than speaking, fostering more affective, memorable, and pleasurable interactions in character-based roleplay and interactive entertainment scenarios. SingingSDS employs a modular ASR-LLM-SVS pipeline and supports a wide range of configurations across character personas, ASR and LLM backends, SVS models, melody sources, and voice profiles, tailored to different needs in terms of latency, quality, and musical style. SingingSDS is available as a plug-and-play web demo, featuring modular, open-source code that supports customization and extension.

Cite this Paper


BibTeX
@InProceedings{pmlr-v303-han26a, title = {SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications}, author = {Han, Jionghao and Shi, Jiatong and Someki, Masao and Tang, Yuxun and Liu, Lan and Zhao, Yiwen and Feng, Wenhao and Watanabe, Shinji}, booktitle = {Proceedings of Machine Learning Research}, pages = {1--21}, year = {2026}, editor = {Herremans, Dorien and Bhandari, Keshav and Roy, Abhinaba and Colton, Simon and Barthet, Mathieu}, volume = {303}, series = {Proceedings of Machine Learning Research}, month = {26 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v303/main/assets/han26a/han26a.pdf}, url = {https://proceedings.mlr.press/v303/han26a.html}, abstract = {With recent advances in automatic speech recognition (ASR), large language models (LLMs), and text-to-speech (TTS) technologies, spoken dialogue systems (SDS) have become widely accessible. However, most existing SDS are limited to conventional spoken responses. We present SingingSDS, a cascaded SDS that responds through singing rather than speaking, fostering more affective, memorable, and pleasurable interactions in character-based roleplay and interactive entertainment scenarios. SingingSDS employs a modular ASR-LLM-SVS pipeline and supports a wide range of configurations across character personas, ASR and LLM backends, SVS models, melody sources, and voice profiles, tailored to different needs in terms of latency, quality, and musical style. SingingSDS is available as a plug-and-play web demo, featuring modular, open-source code that supports customization and extension.} }
Endnote
%0 Conference Paper %T SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications %A Jionghao Han %A Jiatong Shi %A Masao Someki %A Yuxun Tang %A Lan Liu %A Yiwen Zhao %A Wenhao Feng %A Shinji Watanabe %B Proceedings of Machine Learning Research %C Proceedings of Machine Learning Research %D 2026 %E Dorien Herremans %E Keshav Bhandari %E Abhinaba Roy %E Simon Colton %E Mathieu Barthet %F pmlr-v303-han26a %I PMLR %P 1--21 %U https://proceedings.mlr.press/v303/han26a.html %V 303 %X With recent advances in automatic speech recognition (ASR), large language models (LLMs), and text-to-speech (TTS) technologies, spoken dialogue systems (SDS) have become widely accessible. However, most existing SDS are limited to conventional spoken responses. We present SingingSDS, a cascaded SDS that responds through singing rather than speaking, fostering more affective, memorable, and pleasurable interactions in character-based roleplay and interactive entertainment scenarios. SingingSDS employs a modular ASR-LLM-SVS pipeline and supports a wide range of configurations across character personas, ASR and LLM backends, SVS models, melody sources, and voice profiles, tailored to different needs in terms of latency, quality, and musical style. SingingSDS is available as a plug-and-play web demo, featuring modular, open-source code that supports customization and extension.
APA
Han, J., Shi, J., Someki, M., Tang, Y., Liu, L., Zhao, Y., Feng, W. & Watanabe, S.. (2026). SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications. Proceedings of Machine Learning Research, in Proceedings of Machine Learning Research 303:1-21 Available from https://proceedings.mlr.press/v303/han26a.html.

Related Material