Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Hila Manor, Tomer Michaeli
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:34603-34629, 2024.

Abstract

Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found on our examples page.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-manor24a, title = {Zero-Shot Unsupervised and Text-Based Audio Editing Using {DDPM} Inversion}, author = {Manor, Hila and Michaeli, Tomer}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {34603--34629}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/manor24a/manor24a.pdf}, url = {https://proceedings.mlr.press/v235/manor24a.html}, abstract = {Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found on our examples page.} }
Endnote
%0 Conference Paper %T Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion %A Hila Manor %A Tomer Michaeli %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-manor24a %I PMLR %P 34603--34629 %U https://proceedings.mlr.press/v235/manor24a.html %V 235 %X Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found on our examples page.
APA
Manor, H. & Michaeli, T.. (2024). Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:34603-34629 Available from https://proceedings.mlr.press/v235/manor24a.html.

Related Material