Diffusion Models Encode the Intrinsic Dimension of Data Manifolds

Jan Pawel Stanczuk, Georgios Batzolis, Teo Deveney, Carola-Bibiane Schönlieb
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:46412-46440, 2024.

Abstract

In this work, we provide a mathematical proof that diffusion models encode data manifolds by approximating their normal bundles. Based on this observation we propose a novel method for extracting the intrinsic dimension of the data manifold from a trained diffusion model. Our insights are based on the fact that a diffusion model approximates the score function i.e. the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximal likelihood increase. Therefore, at low noise levels, the diffusion model provides us with an approximation of the manifold’s normal bundle, allowing for an estimation of the manifold’s intrinsic dimension. To the best of our knowledge our method is the first estimator of intrinsic dimension based on diffusion models and it outperforms well established estimators in controlled experiments on both Euclidean and image data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-stanczuk24a, title = {Diffusion Models Encode the Intrinsic Dimension of Data Manifolds}, author = {Stanczuk, Jan Pawel and Batzolis, Georgios and Deveney, Teo and Sch\"{o}nlieb, Carola-Bibiane}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {46412--46440}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/stanczuk24a/stanczuk24a.pdf}, url = {https://proceedings.mlr.press/v235/stanczuk24a.html}, abstract = {In this work, we provide a mathematical proof that diffusion models encode data manifolds by approximating their normal bundles. Based on this observation we propose a novel method for extracting the intrinsic dimension of the data manifold from a trained diffusion model. Our insights are based on the fact that a diffusion model approximates the score function i.e. the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximal likelihood increase. Therefore, at low noise levels, the diffusion model provides us with an approximation of the manifold’s normal bundle, allowing for an estimation of the manifold’s intrinsic dimension. To the best of our knowledge our method is the first estimator of intrinsic dimension based on diffusion models and it outperforms well established estimators in controlled experiments on both Euclidean and image data.} }
Endnote
%0 Conference Paper %T Diffusion Models Encode the Intrinsic Dimension of Data Manifolds %A Jan Pawel Stanczuk %A Georgios Batzolis %A Teo Deveney %A Carola-Bibiane Schönlieb %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-stanczuk24a %I PMLR %P 46412--46440 %U https://proceedings.mlr.press/v235/stanczuk24a.html %V 235 %X In this work, we provide a mathematical proof that diffusion models encode data manifolds by approximating their normal bundles. Based on this observation we propose a novel method for extracting the intrinsic dimension of the data manifold from a trained diffusion model. Our insights are based on the fact that a diffusion model approximates the score function i.e. the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximal likelihood increase. Therefore, at low noise levels, the diffusion model provides us with an approximation of the manifold’s normal bundle, allowing for an estimation of the manifold’s intrinsic dimension. To the best of our knowledge our method is the first estimator of intrinsic dimension based on diffusion models and it outperforms well established estimators in controlled experiments on both Euclidean and image data.
APA
Stanczuk, J.P., Batzolis, G., Deveney, T. & Schönlieb, C.. (2024). Diffusion Models Encode the Intrinsic Dimension of Data Manifolds. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:46412-46440 Available from https://proceedings.mlr.press/v235/stanczuk24a.html.

Related Material