DAG-Structured Clustering by Nearest Neighbors

Nicholas Monath, Manzil Zaheer, Kumar Avinava Dubey, Amr Ahmed, Andrew McCallum
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2854-2862, 2021.

Abstract

Hierarchical clusterings compactly encode multiple granularities of clusters within a tree structure. Hierarchies, by definition, fail to capture different flat partitions that are not subsumed in one another. In this paper, we advocate for an alternative structure for representing multiple clusterings, a directed acyclic graph (DAG). By allowing nodes to have multiple parents, DAG structures are not only more flexible than trees, but also allow for points to be members of multiple clusters. We describe a scalable algorithm, Llama, which simply merges nearest neighbor substructures to form a DAG structure. Llama discovers structures that are more accurate than state-of-the-art tree-based techniques while remaining scalable to large-scale clustering benchmarks. Additionally, we support the proposed algorithm with theoretical guarantees on separated data, including types of data that cannot be correctly clustered by tree-based algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-monath21a, title = { DAG-Structured Clustering by Nearest Neighbors }, author = {Monath, Nicholas and Zaheer, Manzil and Avinava Dubey, Kumar and Ahmed, Amr and McCallum, Andrew}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2854--2862}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/monath21a/monath21a.pdf}, url = {https://proceedings.mlr.press/v130/monath21a.html}, abstract = { Hierarchical clusterings compactly encode multiple granularities of clusters within a tree structure. Hierarchies, by definition, fail to capture different flat partitions that are not subsumed in one another. In this paper, we advocate for an alternative structure for representing multiple clusterings, a directed acyclic graph (DAG). By allowing nodes to have multiple parents, DAG structures are not only more flexible than trees, but also allow for points to be members of multiple clusters. We describe a scalable algorithm, Llama, which simply merges nearest neighbor substructures to form a DAG structure. Llama discovers structures that are more accurate than state-of-the-art tree-based techniques while remaining scalable to large-scale clustering benchmarks. Additionally, we support the proposed algorithm with theoretical guarantees on separated data, including types of data that cannot be correctly clustered by tree-based algorithms. } }
Endnote
%0 Conference Paper %T DAG-Structured Clustering by Nearest Neighbors %A Nicholas Monath %A Manzil Zaheer %A Kumar Avinava Dubey %A Amr Ahmed %A Andrew McCallum %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-monath21a %I PMLR %P 2854--2862 %U https://proceedings.mlr.press/v130/monath21a.html %V 130 %X Hierarchical clusterings compactly encode multiple granularities of clusters within a tree structure. Hierarchies, by definition, fail to capture different flat partitions that are not subsumed in one another. In this paper, we advocate for an alternative structure for representing multiple clusterings, a directed acyclic graph (DAG). By allowing nodes to have multiple parents, DAG structures are not only more flexible than trees, but also allow for points to be members of multiple clusters. We describe a scalable algorithm, Llama, which simply merges nearest neighbor substructures to form a DAG structure. Llama discovers structures that are more accurate than state-of-the-art tree-based techniques while remaining scalable to large-scale clustering benchmarks. Additionally, we support the proposed algorithm with theoretical guarantees on separated data, including types of data that cannot be correctly clustered by tree-based algorithms.
APA
Monath, N., Zaheer, M., Avinava Dubey, K., Ahmed, A. & McCallum, A.. (2021). DAG-Structured Clustering by Nearest Neighbors . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2854-2862 Available from https://proceedings.mlr.press/v130/monath21a.html.

Related Material