Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise

Zhenkai Zhang, Krista A. Ehinger, Tom Drummond
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:1638-1653, 2024.

Abstract

This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes. The first contribution involves reparameterizing the diffusion process in terms of the angle on a quarter-circular arc between the image and noise, specifically setting the conventional $\sqrt{\bar{\alpha}}=\cos(\eta)$. This reparameterization eliminates two singularities and allows for the expression of diffusion evolution as a well-behaved ordinary differential equation (ODE). In turn, this allows higher order ODE solvers such as Runge-Kutta methods to be used effectively. The second contribution is to directly estimate both the image ($\mathbf{x}_0$) and noise ($\mathbf{\epsilon}$) using our network, which enables more stable calculations of the update step in the inverse diffusion steps, as accurate estimation of both the image and noise are crucial at different stages of the process. Together with these changes, our model achieves faster generation, with the ability to converge on high-quality images more quickly, and higher quality of the generated images, as measured by metrics such as Fr{é}chet Inception Distance (FID), spatial Fr{é}chet Inception Distance (sFID), precision, and recall.

Cite this Paper


BibTeX
@InProceedings{pmlr-v222-zhang24b, title = {Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise}, author = {Zhang, Zhenkai and Ehinger, Krista A. and Drummond, Tom}, booktitle = {Proceedings of the 15th Asian Conference on Machine Learning}, pages = {1638--1653}, year = {2024}, editor = {Yanıkoğlu, Berrin and Buntine, Wray}, volume = {222}, series = {Proceedings of Machine Learning Research}, month = {11--14 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v222/zhang24b/zhang24b.pdf}, url = {https://proceedings.mlr.press/v222/zhang24b.html}, abstract = {This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes. The first contribution involves reparameterizing the diffusion process in terms of the angle on a quarter-circular arc between the image and noise, specifically setting the conventional $\sqrt{\bar{\alpha}}=\cos(\eta)$. This reparameterization eliminates two singularities and allows for the expression of diffusion evolution as a well-behaved ordinary differential equation (ODE). In turn, this allows higher order ODE solvers such as Runge-Kutta methods to be used effectively. The second contribution is to directly estimate both the image ($\mathbf{x}_0$) and noise ($\mathbf{\epsilon}$) using our network, which enables more stable calculations of the update step in the inverse diffusion steps, as accurate estimation of both the image and noise are crucial at different stages of the process. Together with these changes, our model achieves faster generation, with the ability to converge on high-quality images more quickly, and higher quality of the generated images, as measured by metrics such as Fr{é}chet Inception Distance (FID), spatial Fr{é}chet Inception Distance (sFID), precision, and recall.} }
Endnote
%0 Conference Paper %T Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise %A Zhenkai Zhang %A Krista A. Ehinger %A Tom Drummond %B Proceedings of the 15th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Berrin Yanıkoğlu %E Wray Buntine %F pmlr-v222-zhang24b %I PMLR %P 1638--1653 %U https://proceedings.mlr.press/v222/zhang24b.html %V 222 %X This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes. The first contribution involves reparameterizing the diffusion process in terms of the angle on a quarter-circular arc between the image and noise, specifically setting the conventional $\sqrt{\bar{\alpha}}=\cos(\eta)$. This reparameterization eliminates two singularities and allows for the expression of diffusion evolution as a well-behaved ordinary differential equation (ODE). In turn, this allows higher order ODE solvers such as Runge-Kutta methods to be used effectively. The second contribution is to directly estimate both the image ($\mathbf{x}_0$) and noise ($\mathbf{\epsilon}$) using our network, which enables more stable calculations of the update step in the inverse diffusion steps, as accurate estimation of both the image and noise are crucial at different stages of the process. Together with these changes, our model achieves faster generation, with the ability to converge on high-quality images more quickly, and higher quality of the generated images, as measured by metrics such as Fr{é}chet Inception Distance (FID), spatial Fr{é}chet Inception Distance (sFID), precision, and recall.
APA
Zhang, Z., Ehinger, K.A. & Drummond, T.. (2024). Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:1638-1653 Available from https://proceedings.mlr.press/v222/zhang24b.html.

Related Material