Hierarchical Clustering in General Metric Spaces using Approximate Nearest Neighbors

Benjamin Moseley, Sergei Vassilvtiskii, Yuyan Wang
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2440-2448, 2021.

Abstract

Hierarchical clustering is a widely used data analysis method, but suffers from scalability issues, requiring quadratic time in general metric spaces. In this work, we demonstrate how approximate nearest neighbor (ANN) queries can be used to improve the running time of the popular single-linkage and average-linkage methods. Our proposed algorithms are the first subquadratic time algorithms for non-Euclidean metrics. We complement our theoretical analysis with an empirical evaluation showcasing our methods’ efficiency and accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-moseley21a, title = { Hierarchical Clustering in General Metric Spaces using Approximate Nearest Neighbors }, author = {Moseley, Benjamin and Vassilvtiskii, Sergei and Wang, Yuyan}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2440--2448}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/moseley21a/moseley21a.pdf}, url = {https://proceedings.mlr.press/v130/moseley21a.html}, abstract = { Hierarchical clustering is a widely used data analysis method, but suffers from scalability issues, requiring quadratic time in general metric spaces. In this work, we demonstrate how approximate nearest neighbor (ANN) queries can be used to improve the running time of the popular single-linkage and average-linkage methods. Our proposed algorithms are the first subquadratic time algorithms for non-Euclidean metrics. We complement our theoretical analysis with an empirical evaluation showcasing our methods’ efficiency and accuracy. } }
Endnote
%0 Conference Paper %T Hierarchical Clustering in General Metric Spaces using Approximate Nearest Neighbors %A Benjamin Moseley %A Sergei Vassilvtiskii %A Yuyan Wang %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-moseley21a %I PMLR %P 2440--2448 %U https://proceedings.mlr.press/v130/moseley21a.html %V 130 %X Hierarchical clustering is a widely used data analysis method, but suffers from scalability issues, requiring quadratic time in general metric spaces. In this work, we demonstrate how approximate nearest neighbor (ANN) queries can be used to improve the running time of the popular single-linkage and average-linkage methods. Our proposed algorithms are the first subquadratic time algorithms for non-Euclidean metrics. We complement our theoretical analysis with an empirical evaluation showcasing our methods’ efficiency and accuracy.
APA
Moseley, B., Vassilvtiskii, S. & Wang, Y.. (2021). Hierarchical Clustering in General Metric Spaces using Approximate Nearest Neighbors . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2440-2448 Available from https://proceedings.mlr.press/v130/moseley21a.html.

Related Material