Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Shyam Narayanan; Sandeep Silwal; Piotr Indyk; Or Zamir

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Shyam Narayanan, Sandeep Silwal, Piotr Indyk, Or Zamir

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7948-7957, 2021.

Abstract

Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset

$X$ onto a random

$d = O(d_X)$ -dimensional subspace (where

$d_X$ is the doubling dimension of

$X$ ), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension

$d$ having an extra

$\log \log n$ term and the approximation factor being arbitrarily close to

$1$ . Furthermore, we extend these results to approximating {\em solutions} instead of just their {\em costs}. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of

$k$ -means and

$k$ -medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of

$X$ .

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-narayanan21b,
  title = 	 {Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering},
  author =       {Narayanan, Shyam and Silwal, Sandeep and Indyk, Piotr and Zamir, Or},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {7948--7957},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/narayanan21b/narayanan21b.pdf},
  url = 	 {https://proceedings.mlr.press/v139/narayanan21b.html},
  abstract = 	 {Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional subspace (where $d_X$ is the doubling dimension of $X$), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension $d$ having an extra $\log \log n$ term and the approximation factor being arbitrarily close to $1$. Furthermore, we extend these results to approximating {\em solutions} instead of just their {\em costs}. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of $k$-means and $k$-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of $X$.}
}

Endnote

%0 Conference Paper
%T Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering
%A Shyam Narayanan
%A Sandeep Silwal
%A Piotr Indyk
%A Or Zamir
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-narayanan21b
%I PMLR
%P 7948--7957
%U https://proceedings.mlr.press/v139/narayanan21b.html
%V 139
%X Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional subspace (where $d_X$ is the doubling dimension of $X$), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension $d$ having an extra $\log \log n$ term and the approximation factor being arbitrarily close to $1$. Furthermore, we extend these results to approximating {\em solutions} instead of just their {\em costs}. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of $k$-means and $k$-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of $X$.

APA


Narayanan, S., Silwal, S., Indyk, P. & Zamir, O.. (2021). Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7948-7957 Available from https://proceedings.mlr.press/v139/narayanan21b.html.

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Abstract

Cite this Paper

Related Material