Hierarchical Clustering for Euclidean Data

Moses Charikar, Vaggos Chatziafratis, Rad Niazadeh, Grigory Yaroslavtsev
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2721-2730, 2019.

Abstract

Recent works on Hierarchical Clustering (HC), a well-studied problem in exploratory data analysis, have focused on optimizing various objective functions for this problem under arbitrary similarity measures. In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in several typical machine learning applications. We focus primarily on the popular Gaussian kernel and present our results through the lens of the objective introduced recently by [MW’17]. We show the approximation factor in [MW’17] can be improved for Euclidean data. We further demonstrate both theoretically and experimentally that our algorithms scale to very high dimension d, while outperforming average-linkage and showing competitive results against other less scalable approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-charikar19a, title = {Hierarchical Clustering for Euclidean Data}, author = {Charikar, Moses and Chatziafratis, Vaggos and Niazadeh, Rad and Yaroslavtsev, Grigory}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {2721--2730}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/charikar19a/charikar19a.pdf}, url = {https://proceedings.mlr.press/v89/charikar19a.html}, abstract = {Recent works on Hierarchical Clustering (HC), a well-studied problem in exploratory data analysis, have focused on optimizing various objective functions for this problem under arbitrary similarity measures. In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in several typical machine learning applications. We focus primarily on the popular Gaussian kernel and present our results through the lens of the objective introduced recently by [MW’17]. We show the approximation factor in [MW’17] can be improved for Euclidean data. We further demonstrate both theoretically and experimentally that our algorithms scale to very high dimension d, while outperforming average-linkage and showing competitive results against other less scalable approaches.} }
Endnote
%0 Conference Paper %T Hierarchical Clustering for Euclidean Data %A Moses Charikar %A Vaggos Chatziafratis %A Rad Niazadeh %A Grigory Yaroslavtsev %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-charikar19a %I PMLR %P 2721--2730 %U https://proceedings.mlr.press/v89/charikar19a.html %V 89 %X Recent works on Hierarchical Clustering (HC), a well-studied problem in exploratory data analysis, have focused on optimizing various objective functions for this problem under arbitrary similarity measures. In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in several typical machine learning applications. We focus primarily on the popular Gaussian kernel and present our results through the lens of the objective introduced recently by [MW’17]. We show the approximation factor in [MW’17] can be improved for Euclidean data. We further demonstrate both theoretically and experimentally that our algorithms scale to very high dimension d, while outperforming average-linkage and showing competitive results against other less scalable approaches.
APA
Charikar, M., Chatziafratis, V., Niazadeh, R. & Yaroslavtsev, G.. (2019). Hierarchical Clustering for Euclidean Data. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2721-2730 Available from https://proceedings.mlr.press/v89/charikar19a.html.

Related Material