Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees

Anand Rajagopalan, Fabio Vitale, Danny Vainstein, Gui Citovsky, Cecilia M Procopiuc, Claudio Gentile
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8799-8809, 2021.

Abstract

We investigate the problem of hierarchically clustering data streams containing metric data in R^d. We introduce a desirable invariance property for such algorithms, describe a general family of hyperplane-based methods enjoying this property, and analyze two scalable instances of this general family against recently popularized similarity/dissimilarity-based metrics for hierarchical clustering. We prove a number of new results related to the approximation ratios of these algorithms, improving in various ways over the literature on this subject. Finally, since our algorithms are principled but also very practical, we carry out an experimental comparison on both synthetic and real-world datasets showing competitive results against known baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-rajagopalan21a, title = {Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees}, author = {Rajagopalan, Anand and Vitale, Fabio and Vainstein, Danny and Citovsky, Gui and Procopiuc, Cecilia M and Gentile, Claudio}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8799--8809}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/rajagopalan21a/rajagopalan21a.pdf}, url = {https://proceedings.mlr.press/v139/rajagopalan21a.html}, abstract = {We investigate the problem of hierarchically clustering data streams containing metric data in R^d. We introduce a desirable invariance property for such algorithms, describe a general family of hyperplane-based methods enjoying this property, and analyze two scalable instances of this general family against recently popularized similarity/dissimilarity-based metrics for hierarchical clustering. We prove a number of new results related to the approximation ratios of these algorithms, improving in various ways over the literature on this subject. Finally, since our algorithms are principled but also very practical, we carry out an experimental comparison on both synthetic and real-world datasets showing competitive results against known baselines.} }
Endnote
%0 Conference Paper %T Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees %A Anand Rajagopalan %A Fabio Vitale %A Danny Vainstein %A Gui Citovsky %A Cecilia M Procopiuc %A Claudio Gentile %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-rajagopalan21a %I PMLR %P 8799--8809 %U https://proceedings.mlr.press/v139/rajagopalan21a.html %V 139 %X We investigate the problem of hierarchically clustering data streams containing metric data in R^d. We introduce a desirable invariance property for such algorithms, describe a general family of hyperplane-based methods enjoying this property, and analyze two scalable instances of this general family against recently popularized similarity/dissimilarity-based metrics for hierarchical clustering. We prove a number of new results related to the approximation ratios of these algorithms, improving in various ways over the literature on this subject. Finally, since our algorithms are principled but also very practical, we carry out an experimental comparison on both synthetic and real-world datasets showing competitive results against known baselines.
APA
Rajagopalan, A., Vitale, F., Vainstein, D., Citovsky, G., Procopiuc, C.M. & Gentile, C.. (2021). Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8799-8809 Available from https://proceedings.mlr.press/v139/rajagopalan21a.html.

Related Material