Differentially-Private Clustering of Easy Instances

Edith Cohen, Haim Kaplan, Yishay Mansour, Uri Stemmer, Eliad Tsfadia
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2049-2059, 2021.

Abstract

Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify k cluster centers without disclosing information on individual data points. Despite significant research progress, the problem had so far resisted practical solutions. In this work we aim at providing simple implementable differentrially private clustering algorithms when the the data is "easy," e.g., when there exists a significant separation between the clusters. For the easy instances we consider, we have a simple implementation based on utilizing non-private clustering algorithms, and combining them privately. We are able to get improved sample complexity bounds in some cases of Gaussian mixtures and k-means. We complement our theoretical algorithms with experiments of simulated data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-cohen21c, title = {Differentially-Private Clustering of Easy Instances}, author = {Cohen, Edith and Kaplan, Haim and Mansour, Yishay and Stemmer, Uri and Tsfadia, Eliad}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2049--2059}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/cohen21c/cohen21c.pdf}, url = {https://proceedings.mlr.press/v139/cohen21c.html}, abstract = {Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify k cluster centers without disclosing information on individual data points. Despite significant research progress, the problem had so far resisted practical solutions. In this work we aim at providing simple implementable differentrially private clustering algorithms when the the data is "easy," e.g., when there exists a significant separation between the clusters. For the easy instances we consider, we have a simple implementation based on utilizing non-private clustering algorithms, and combining them privately. We are able to get improved sample complexity bounds in some cases of Gaussian mixtures and k-means. We complement our theoretical algorithms with experiments of simulated data.} }
Endnote
%0 Conference Paper %T Differentially-Private Clustering of Easy Instances %A Edith Cohen %A Haim Kaplan %A Yishay Mansour %A Uri Stemmer %A Eliad Tsfadia %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-cohen21c %I PMLR %P 2049--2059 %U https://proceedings.mlr.press/v139/cohen21c.html %V 139 %X Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify k cluster centers without disclosing information on individual data points. Despite significant research progress, the problem had so far resisted practical solutions. In this work we aim at providing simple implementable differentrially private clustering algorithms when the the data is "easy," e.g., when there exists a significant separation between the clusters. For the easy instances we consider, we have a simple implementation based on utilizing non-private clustering algorithms, and combining them privately. We are able to get improved sample complexity bounds in some cases of Gaussian mixtures and k-means. We complement our theoretical algorithms with experiments of simulated data.
APA
Cohen, E., Kaplan, H., Mansour, Y., Stemmer, U. & Tsfadia, E.. (2021). Differentially-Private Clustering of Easy Instances. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2049-2059 Available from https://proceedings.mlr.press/v139/cohen21c.html.

Related Material