Prompt-guided Precise Audio Editing with Diffusion Models

Manjie Xu, Chenxing Li, Duzhen Zhang, Dan Su, Wei Liang, Dong Yu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:55126-55143, 2024.

Abstract

Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as PPAE, which serves as a general module for diffusion models and enables precise audio editing. The editing is based on the input textual prompt only and is entirely training-free. We exploit the cross-attention maps of diffusion models to facilitate accurate local editing and employ a hierarchical local-global pipeline to ensure a smoother editing process. Experimental results highlight the effectiveness of our method in various editing tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-xu24p, title = {Prompt-guided Precise Audio Editing with Diffusion Models}, author = {Xu, Manjie and Li, Chenxing and Zhang, Duzhen and Su, Dan and Liang, Wei and Yu, Dong}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {55126--55143}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xu24p/xu24p.pdf}, url = {https://proceedings.mlr.press/v235/xu24p.html}, abstract = {Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as PPAE, which serves as a general module for diffusion models and enables precise audio editing. The editing is based on the input textual prompt only and is entirely training-free. We exploit the cross-attention maps of diffusion models to facilitate accurate local editing and employ a hierarchical local-global pipeline to ensure a smoother editing process. Experimental results highlight the effectiveness of our method in various editing tasks.} }
Endnote
%0 Conference Paper %T Prompt-guided Precise Audio Editing with Diffusion Models %A Manjie Xu %A Chenxing Li %A Duzhen Zhang %A Dan Su %A Wei Liang %A Dong Yu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-xu24p %I PMLR %P 55126--55143 %U https://proceedings.mlr.press/v235/xu24p.html %V 235 %X Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as PPAE, which serves as a general module for diffusion models and enables precise audio editing. The editing is based on the input textual prompt only and is entirely training-free. We exploit the cross-attention maps of diffusion models to facilitate accurate local editing and employ a hierarchical local-global pipeline to ensure a smoother editing process. Experimental results highlight the effectiveness of our method in various editing tasks.
APA
Xu, M., Li, C., Zhang, D., Su, D., Liang, W. & Yu, D.. (2024). Prompt-guided Precise Audio Editing with Diffusion Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:55126-55143 Available from https://proceedings.mlr.press/v235/xu24p.html.

Related Material