PixelAsParam: A Gradient View on Diffusion Sampling with Guidance

Anh-Dung Dinh, Daochang Liu, Chang Xu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:8120-8137, 2023.

Abstract

Diffusion models recently achieved state-of-the-art in image generation. They mainly utilize the denoising framework, which leverages the Langevin dynamics process for image sampling. Recently, the guidance method has modified this process to add conditional information to achieve a controllable generator. However, the current guidance on denoising processes suffers from the trade-off between diversity, image quality, and conditional information. In this work, we propose to view this guidance sampling process from a gradient view, where image pixels are treated as parameters being optimized, and each mathematical term in the sampling process represents one update direction. This perspective reveals more insights into the conflict problems between updated directions on the pixels, which cause the trade-off as mentioned previously. We investigate the conflict problems and propose to solve them by a simple projection method. The experimental results evidently improve over different baselines on datasets with various resolutions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-dinh23a, title = {{P}ixel{A}s{P}aram: A Gradient View on Diffusion Sampling with Guidance}, author = {Dinh, Anh-Dung and Liu, Daochang and Xu, Chang}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {8120--8137}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/dinh23a/dinh23a.pdf}, url = {https://proceedings.mlr.press/v202/dinh23a.html}, abstract = {Diffusion models recently achieved state-of-the-art in image generation. They mainly utilize the denoising framework, which leverages the Langevin dynamics process for image sampling. Recently, the guidance method has modified this process to add conditional information to achieve a controllable generator. However, the current guidance on denoising processes suffers from the trade-off between diversity, image quality, and conditional information. In this work, we propose to view this guidance sampling process from a gradient view, where image pixels are treated as parameters being optimized, and each mathematical term in the sampling process represents one update direction. This perspective reveals more insights into the conflict problems between updated directions on the pixels, which cause the trade-off as mentioned previously. We investigate the conflict problems and propose to solve them by a simple projection method. The experimental results evidently improve over different baselines on datasets with various resolutions.} }
Endnote
%0 Conference Paper %T PixelAsParam: A Gradient View on Diffusion Sampling with Guidance %A Anh-Dung Dinh %A Daochang Liu %A Chang Xu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-dinh23a %I PMLR %P 8120--8137 %U https://proceedings.mlr.press/v202/dinh23a.html %V 202 %X Diffusion models recently achieved state-of-the-art in image generation. They mainly utilize the denoising framework, which leverages the Langevin dynamics process for image sampling. Recently, the guidance method has modified this process to add conditional information to achieve a controllable generator. However, the current guidance on denoising processes suffers from the trade-off between diversity, image quality, and conditional information. In this work, we propose to view this guidance sampling process from a gradient view, where image pixels are treated as parameters being optimized, and each mathematical term in the sampling process represents one update direction. This perspective reveals more insights into the conflict problems between updated directions on the pixels, which cause the trade-off as mentioned previously. We investigate the conflict problems and propose to solve them by a simple projection method. The experimental results evidently improve over different baselines on datasets with various resolutions.
APA
Dinh, A., Liu, D. & Xu, C.. (2023). PixelAsParam: A Gradient View on Diffusion Sampling with Guidance. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:8120-8137 Available from https://proceedings.mlr.press/v202/dinh23a.html.

Related Material