FRAG: Frequency Adapting Group for Diffusion Video Editing

Sunjae Yoon, Gwanhyeong Koo, Geonwoo Kim, Chang D. Yoo
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:57315-57330, 2024.

Abstract

In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our primary focus is on overcoming the current challenges in high quality edit to ensure that each edit enhances the final product without disrupting its intended essence. However, quality deterioration such as blurring and flickering is routinely observed in recent diffusion video editing systems. We confirm that this deterioration often stems from high-frequency leak: the diffusion model fails to accurately synthesize high-frequency components during denoising process. To this end, we devise Frequency Adapting Group (FRAG) which enhances the video quality in terms of consistency and fidelity by introducing a novel receptive field branch to preserve high-frequency components during the denoising process. FRAG is performed in a model-agnostic manner without additional training and validates the effectiveness on video editing benchmarks (i.e., TGVE, DAVIS).

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-yoon24c, title = {{FRAG}: Frequency Adapting Group for Diffusion Video Editing}, author = {Yoon, Sunjae and Koo, Gwanhyeong and Kim, Geonwoo and Yoo, Chang D.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {57315--57330}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/yoon24c/yoon24c.pdf}, url = {https://proceedings.mlr.press/v235/yoon24c.html}, abstract = {In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our primary focus is on overcoming the current challenges in high quality edit to ensure that each edit enhances the final product without disrupting its intended essence. However, quality deterioration such as blurring and flickering is routinely observed in recent diffusion video editing systems. We confirm that this deterioration often stems from high-frequency leak: the diffusion model fails to accurately synthesize high-frequency components during denoising process. To this end, we devise Frequency Adapting Group (FRAG) which enhances the video quality in terms of consistency and fidelity by introducing a novel receptive field branch to preserve high-frequency components during the denoising process. FRAG is performed in a model-agnostic manner without additional training and validates the effectiveness on video editing benchmarks (i.e., TGVE, DAVIS).} }
Endnote
%0 Conference Paper %T FRAG: Frequency Adapting Group for Diffusion Video Editing %A Sunjae Yoon %A Gwanhyeong Koo %A Geonwoo Kim %A Chang D. Yoo %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-yoon24c %I PMLR %P 57315--57330 %U https://proceedings.mlr.press/v235/yoon24c.html %V 235 %X In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our primary focus is on overcoming the current challenges in high quality edit to ensure that each edit enhances the final product without disrupting its intended essence. However, quality deterioration such as blurring and flickering is routinely observed in recent diffusion video editing systems. We confirm that this deterioration often stems from high-frequency leak: the diffusion model fails to accurately synthesize high-frequency components during denoising process. To this end, we devise Frequency Adapting Group (FRAG) which enhances the video quality in terms of consistency and fidelity by introducing a novel receptive field branch to preserve high-frequency components during the denoising process. FRAG is performed in a model-agnostic manner without additional training and validates the effectiveness on video editing benchmarks (i.e., TGVE, DAVIS).
APA
Yoon, S., Koo, G., Kim, G. & Yoo, C.D.. (2024). FRAG: Frequency Adapting Group for Diffusion Video Editing. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:57315-57330 Available from https://proceedings.mlr.press/v235/yoon24c.html.

Related Material