Cluster Trellis: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering

Sebastian Macaluso, Craig Greenberg, Nicholas Monath, Ji Ah Lee, Patrick Flaherty, Kyle Cranmer, Andrew McGregor, Andrew McCallum
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2467-2475, 2021.

Abstract

Hierarchical clustering is a fundamental task often used to discover meaningful structures in data. Due to the combinatorial number of possible hierarchical clusterings, approximate algorithms are typically used for inference. In contrast to existing methods, we present novel dynamic-programming algorithms for exact inference in hierarchical clustering based on a novel trellis data structure, and we prove that we can exactly compute the partition function, maximum likelihood hierarchy, and marginal probabilities of sub-hierarchies and clusters. Our algorithms scale in time and space proportional to the powerset of N elements, which is super-exponentially more efficient than explicitly considering each of the (2N − 3)!! possible hierarchies. Also, for larger datasets where our exact algorithms become infeasible, we introduce an approximate algorithm based on a sparse trellis that out- performs greedy and beam search baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-macaluso21a, title = { Cluster Trellis: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering }, author = {Macaluso, Sebastian and Greenberg, Craig and Monath, Nicholas and Ah Lee, Ji and Flaherty, Patrick and Cranmer, Kyle and McGregor, Andrew and McCallum, Andrew}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2467--2475}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/macaluso21a/macaluso21a.pdf}, url = {https://proceedings.mlr.press/v130/macaluso21a.html}, abstract = { Hierarchical clustering is a fundamental task often used to discover meaningful structures in data. Due to the combinatorial number of possible hierarchical clusterings, approximate algorithms are typically used for inference. In contrast to existing methods, we present novel dynamic-programming algorithms for exact inference in hierarchical clustering based on a novel trellis data structure, and we prove that we can exactly compute the partition function, maximum likelihood hierarchy, and marginal probabilities of sub-hierarchies and clusters. Our algorithms scale in time and space proportional to the powerset of N elements, which is super-exponentially more efficient than explicitly considering each of the (2N − 3)!! possible hierarchies. Also, for larger datasets where our exact algorithms become infeasible, we introduce an approximate algorithm based on a sparse trellis that out- performs greedy and beam search baselines. } }
Endnote
%0 Conference Paper %T Cluster Trellis: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering %A Sebastian Macaluso %A Craig Greenberg %A Nicholas Monath %A Ji Ah Lee %A Patrick Flaherty %A Kyle Cranmer %A Andrew McGregor %A Andrew McCallum %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-macaluso21a %I PMLR %P 2467--2475 %U https://proceedings.mlr.press/v130/macaluso21a.html %V 130 %X Hierarchical clustering is a fundamental task often used to discover meaningful structures in data. Due to the combinatorial number of possible hierarchical clusterings, approximate algorithms are typically used for inference. In contrast to existing methods, we present novel dynamic-programming algorithms for exact inference in hierarchical clustering based on a novel trellis data structure, and we prove that we can exactly compute the partition function, maximum likelihood hierarchy, and marginal probabilities of sub-hierarchies and clusters. Our algorithms scale in time and space proportional to the powerset of N elements, which is super-exponentially more efficient than explicitly considering each of the (2N − 3)!! possible hierarchies. Also, for larger datasets where our exact algorithms become infeasible, we introduce an approximate algorithm based on a sparse trellis that out- performs greedy and beam search baselines.
APA
Macaluso, S., Greenberg, C., Monath, N., Ah Lee, J., Flaherty, P., Cranmer, K., McGregor, A. & McCallum, A.. (2021). Cluster Trellis: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2467-2475 Available from https://proceedings.mlr.press/v130/macaluso21a.html.

Related Material