Recovery Guarantees for Kernel-based Clustering under Non-parametric Mixture Models

Leena C. Vankadara, Sebastian Bordt, Ulrike von Luxburg, Debarghya Ghoshdastidar
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3817-3825, 2021.

Abstract

Despite the ubiquity of kernel-based clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process. In this work, we take a step towards bridging this gap by studying the statistical performance of kernel-based clustering algorithms under non-parametric mixture models. We provide necessary and sufficient separability conditions under which these algorithms can consistently recover the underlying true clustering. Our analysis provides guarantees for kernel clustering approaches without structural assumptions on the form of the component distributions. Additionally, we establish a key equivalence between kernel-based data-clustering and kernel density-based clustering. This enables us to provide consistency guarantees for kernel-based estimators of non-parametric mixture models. Along with theoretical implications, this connection could have practical implications, including in the systematic choice of the bandwidth of the Gaussian kernel in the context of clustering.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-vankadara21a, title = { Recovery Guarantees for Kernel-based Clustering under Non-parametric Mixture Models }, author = {Vankadara, Leena C. and Bordt, Sebastian and von Luxburg, Ulrike and Ghoshdastidar, Debarghya}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {3817--3825}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/vankadara21a/vankadara21a.pdf}, url = {https://proceedings.mlr.press/v130/vankadara21a.html}, abstract = { Despite the ubiquity of kernel-based clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process. In this work, we take a step towards bridging this gap by studying the statistical performance of kernel-based clustering algorithms under non-parametric mixture models. We provide necessary and sufficient separability conditions under which these algorithms can consistently recover the underlying true clustering. Our analysis provides guarantees for kernel clustering approaches without structural assumptions on the form of the component distributions. Additionally, we establish a key equivalence between kernel-based data-clustering and kernel density-based clustering. This enables us to provide consistency guarantees for kernel-based estimators of non-parametric mixture models. Along with theoretical implications, this connection could have practical implications, including in the systematic choice of the bandwidth of the Gaussian kernel in the context of clustering. } }
Endnote
%0 Conference Paper %T Recovery Guarantees for Kernel-based Clustering under Non-parametric Mixture Models %A Leena C. Vankadara %A Sebastian Bordt %A Ulrike von Luxburg %A Debarghya Ghoshdastidar %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-vankadara21a %I PMLR %P 3817--3825 %U https://proceedings.mlr.press/v130/vankadara21a.html %V 130 %X Despite the ubiquity of kernel-based clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process. In this work, we take a step towards bridging this gap by studying the statistical performance of kernel-based clustering algorithms under non-parametric mixture models. We provide necessary and sufficient separability conditions under which these algorithms can consistently recover the underlying true clustering. Our analysis provides guarantees for kernel clustering approaches without structural assumptions on the form of the component distributions. Additionally, we establish a key equivalence between kernel-based data-clustering and kernel density-based clustering. This enables us to provide consistency guarantees for kernel-based estimators of non-parametric mixture models. Along with theoretical implications, this connection could have practical implications, including in the systematic choice of the bandwidth of the Gaussian kernel in the context of clustering.
APA
Vankadara, L.C., Bordt, S., von Luxburg, U. & Ghoshdastidar, D.. (2021). Recovery Guarantees for Kernel-based Clustering under Non-parametric Mixture Models . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:3817-3825 Available from https://proceedings.mlr.press/v130/vankadara21a.html.

Related Material