Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation

Tingyu Zhu, Haoyu Liu, Ziyu Wang, Zhimin Jiang, Zeyu Zheng
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:79938-79962, 2025.

Abstract

Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, which enables real-time interactive generation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhu25m, title = {Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation}, author = {Zhu, Tingyu and Liu, Haoyu and Wang, Ziyu and Jiang, Zhimin and Zheng, Zeyu}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {79938--79962}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhu25m/zhu25m.pdf}, url = {https://proceedings.mlr.press/v267/zhu25m.html}, abstract = {Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, which enables real-time interactive generation.} }
Endnote
%0 Conference Paper %T Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation %A Tingyu Zhu %A Haoyu Liu %A Ziyu Wang %A Zhimin Jiang %A Zeyu Zheng %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhu25m %I PMLR %P 79938--79962 %U https://proceedings.mlr.press/v267/zhu25m.html %V 267 %X Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, which enables real-time interactive generation.
APA
Zhu, T., Liu, H., Wang, Z., Jiang, Z. & Zheng, Z.. (2025). Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:79938-79962 Available from https://proceedings.mlr.press/v267/zhu25m.html.

Related Material