Clustering via Self-Supervised Diffusion

Roy Uziel, Irit Chelly, Oren Freifeld, Ari Pakman
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:60711-60726, 2025.

Abstract

Diffusion models, widely recognized for their success in generative tasks, have not yet been applied to clustering. We introduce Clustering via Diffusion (CLUDI), a self-supervised framework that combines the generative power of diffusion models with pre-trained Vision Transformer features to achieve robust and accurate clustering. CLUDI is trained via a teacher–student paradigm: the teacher uses stochastic diffusion-based sampling to produce diverse cluster assignments, which the student refines into stable predictions. This stochasticity acts as a novel data augmentation strategy, enabling CLUDI to uncover intricate structures in high-dimensional data. Extensive evaluations on challenging datasets demonstrate that CLUDI achieves state-of-the-art performance in unsupervised classification, setting new benchmarks in clustering robustness and adaptability to complex data distributions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-uziel25a, title = {Clustering via Self-Supervised Diffusion}, author = {Uziel, Roy and Chelly, Irit and Freifeld, Oren and Pakman, Ari}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {60711--60726}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/uziel25a/uziel25a.pdf}, url = {https://proceedings.mlr.press/v267/uziel25a.html}, abstract = {Diffusion models, widely recognized for their success in generative tasks, have not yet been applied to clustering. We introduce Clustering via Diffusion (CLUDI), a self-supervised framework that combines the generative power of diffusion models with pre-trained Vision Transformer features to achieve robust and accurate clustering. CLUDI is trained via a teacher–student paradigm: the teacher uses stochastic diffusion-based sampling to produce diverse cluster assignments, which the student refines into stable predictions. This stochasticity acts as a novel data augmentation strategy, enabling CLUDI to uncover intricate structures in high-dimensional data. Extensive evaluations on challenging datasets demonstrate that CLUDI achieves state-of-the-art performance in unsupervised classification, setting new benchmarks in clustering robustness and adaptability to complex data distributions.} }
Endnote
%0 Conference Paper %T Clustering via Self-Supervised Diffusion %A Roy Uziel %A Irit Chelly %A Oren Freifeld %A Ari Pakman %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-uziel25a %I PMLR %P 60711--60726 %U https://proceedings.mlr.press/v267/uziel25a.html %V 267 %X Diffusion models, widely recognized for their success in generative tasks, have not yet been applied to clustering. We introduce Clustering via Diffusion (CLUDI), a self-supervised framework that combines the generative power of diffusion models with pre-trained Vision Transformer features to achieve robust and accurate clustering. CLUDI is trained via a teacher–student paradigm: the teacher uses stochastic diffusion-based sampling to produce diverse cluster assignments, which the student refines into stable predictions. This stochasticity acts as a novel data augmentation strategy, enabling CLUDI to uncover intricate structures in high-dimensional data. Extensive evaluations on challenging datasets demonstrate that CLUDI achieves state-of-the-art performance in unsupervised classification, setting new benchmarks in clustering robustness and adaptability to complex data distributions.
APA
Uziel, R., Chelly, I., Freifeld, O. & Pakman, A.. (2025). Clustering via Self-Supervised Diffusion. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:60711-60726 Available from https://proceedings.mlr.press/v267/uziel25a.html.

Related Material