Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Shyam Narayanan, Sandeep Silwal, Piotr Indyk, Or Zamir
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7948-7957, 2021.

Abstract

Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional subspace (where $d_X$ is the doubling dimension of $X$), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension $d$ having an extra $\log \log n$ term and the approximation factor being arbitrarily close to $1$. Furthermore, we extend these results to approximating {\em solutions} instead of just their {\em costs}. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of $k$-means and $k$-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of $X$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-narayanan21b, title = {Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering}, author = {Narayanan, Shyam and Silwal, Sandeep and Indyk, Piotr and Zamir, Or}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7948--7957}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/narayanan21b/narayanan21b.pdf}, url = {https://proceedings.mlr.press/v139/narayanan21b.html}, abstract = {Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional subspace (where $d_X$ is the doubling dimension of $X$), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension $d$ having an extra $\log \log n$ term and the approximation factor being arbitrarily close to $1$. Furthermore, we extend these results to approximating {\em solutions} instead of just their {\em costs}. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of $k$-means and $k$-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of $X$.} }
Endnote
%0 Conference Paper %T Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering %A Shyam Narayanan %A Sandeep Silwal %A Piotr Indyk %A Or Zamir %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-narayanan21b %I PMLR %P 7948--7957 %U https://proceedings.mlr.press/v139/narayanan21b.html %V 139 %X Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional subspace (where $d_X$ is the doubling dimension of $X$), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension $d$ having an extra $\log \log n$ term and the approximation factor being arbitrarily close to $1$. Furthermore, we extend these results to approximating {\em solutions} instead of just their {\em costs}. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of $k$-means and $k$-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of $X$.
APA
Narayanan, S., Silwal, S., Indyk, P. & Zamir, O.. (2021). Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7948-7957 Available from https://proceedings.mlr.press/v139/narayanan21b.html.

Related Material