Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

Kulin Shah, Alkis Kalavasis, Adam Klivans, Giannis Daras
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:54143-54166, 2025.

Abstract

There is strong empirical evidence that the stateof-the-art diffusion modeling paradigm leads to models that memorize the training set, especially when the training set is small. Prior methods to mitigate the memorization problem often lead to decrease in image quality. Is it possible to obtain strong and creative generative models, i.e., models that achieve high generation quality and low memorization? Despite the current pessimistic landscape of results, we make significant progress in pushing the trade-off between fidelity and memorization. We first provide theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details). Using this theoretical insight, we propose a simple, principled method to train the diffusion models using noisy data at large noise scales. We show that our method significantly reduces memorization without decreasing the image quality, for both text-conditional and unconditional models and for a variety of data availability settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-shah25b, title = {Does Generation Require Memorization? {C}reative Diffusion Models using Ambient Diffusion}, author = {Shah, Kulin and Kalavasis, Alkis and Klivans, Adam and Daras, Giannis}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {54143--54166}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/shah25b/shah25b.pdf}, url = {https://proceedings.mlr.press/v267/shah25b.html}, abstract = {There is strong empirical evidence that the stateof-the-art diffusion modeling paradigm leads to models that memorize the training set, especially when the training set is small. Prior methods to mitigate the memorization problem often lead to decrease in image quality. Is it possible to obtain strong and creative generative models, i.e., models that achieve high generation quality and low memorization? Despite the current pessimistic landscape of results, we make significant progress in pushing the trade-off between fidelity and memorization. We first provide theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details). Using this theoretical insight, we propose a simple, principled method to train the diffusion models using noisy data at large noise scales. We show that our method significantly reduces memorization without decreasing the image quality, for both text-conditional and unconditional models and for a variety of data availability settings.} }
Endnote
%0 Conference Paper %T Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion %A Kulin Shah %A Alkis Kalavasis %A Adam Klivans %A Giannis Daras %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-shah25b %I PMLR %P 54143--54166 %U https://proceedings.mlr.press/v267/shah25b.html %V 267 %X There is strong empirical evidence that the stateof-the-art diffusion modeling paradigm leads to models that memorize the training set, especially when the training set is small. Prior methods to mitigate the memorization problem often lead to decrease in image quality. Is it possible to obtain strong and creative generative models, i.e., models that achieve high generation quality and low memorization? Despite the current pessimistic landscape of results, we make significant progress in pushing the trade-off between fidelity and memorization. We first provide theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details). Using this theoretical insight, we propose a simple, principled method to train the diffusion models using noisy data at large noise scales. We show that our method significantly reduces memorization without decreasing the image quality, for both text-conditional and unconditional models and for a variety of data availability settings.
APA
Shah, K., Kalavasis, A., Klivans, A. & Daras, G.. (2025). Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:54143-54166 Available from https://proceedings.mlr.press/v267/shah25b.html.

Related Material