An Information-theoretic Perspective of Hierarchical Clustering on Graphs

Yicheng Pan, Bingchen Fan, Pengyu Long, Feng Zheng
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:3322-3345, 2025.

Abstract

The seminal work of \citep{dasgupta2016cost} has introduced a combinatorial cost function for hierarchical graph clustering that has inspired numerous follow-up studies adopting similar combinatorial approaches. In this paper, we investigate this problem from the \emph{information-theoretic} perspective. We formulate a new cost function that is fully explainable and establish the relationship between combinatorial and information-theoretic perspectives. We present two algorithms for expander-like and well-clustered cardinality weighted graphs, respectively, and show that both of them achieve $O(1)$-approximation for our new cost function. Addressing practical needs, we consider non-binary hierarchical clustering problem, and propose a hyperparameter-free framework HCSE that recursively stratifies cluster trees through sparsity-aware partitioning, automatically determining the optimal hierarchy depth via an interpretable mechanism. Extensive experimental results demonstrate the superiority of our cost function and algorithms in binary clustering performance, hierarchy level identification, and reconstruction accuracy compared to existing approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-pan25a, title = {An Information-theoretic Perspective of Hierarchical Clustering on Graphs}, author = {Pan, Yicheng and Fan, Bingchen and Long, Pengyu and Zheng, Feng}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {3322--3345}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/pan25a/pan25a.pdf}, url = {https://proceedings.mlr.press/v286/pan25a.html}, abstract = {The seminal work of \citep{dasgupta2016cost} has introduced a combinatorial cost function for hierarchical graph clustering that has inspired numerous follow-up studies adopting similar combinatorial approaches. In this paper, we investigate this problem from the \emph{information-theoretic} perspective. We formulate a new cost function that is fully explainable and establish the relationship between combinatorial and information-theoretic perspectives. We present two algorithms for expander-like and well-clustered cardinality weighted graphs, respectively, and show that both of them achieve $O(1)$-approximation for our new cost function. Addressing practical needs, we consider non-binary hierarchical clustering problem, and propose a hyperparameter-free framework HCSE that recursively stratifies cluster trees through sparsity-aware partitioning, automatically determining the optimal hierarchy depth via an interpretable mechanism. Extensive experimental results demonstrate the superiority of our cost function and algorithms in binary clustering performance, hierarchy level identification, and reconstruction accuracy compared to existing approaches.} }
Endnote
%0 Conference Paper %T An Information-theoretic Perspective of Hierarchical Clustering on Graphs %A Yicheng Pan %A Bingchen Fan %A Pengyu Long %A Feng Zheng %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-pan25a %I PMLR %P 3322--3345 %U https://proceedings.mlr.press/v286/pan25a.html %V 286 %X The seminal work of \citep{dasgupta2016cost} has introduced a combinatorial cost function for hierarchical graph clustering that has inspired numerous follow-up studies adopting similar combinatorial approaches. In this paper, we investigate this problem from the \emph{information-theoretic} perspective. We formulate a new cost function that is fully explainable and establish the relationship between combinatorial and information-theoretic perspectives. We present two algorithms for expander-like and well-clustered cardinality weighted graphs, respectively, and show that both of them achieve $O(1)$-approximation for our new cost function. Addressing practical needs, we consider non-binary hierarchical clustering problem, and propose a hyperparameter-free framework HCSE that recursively stratifies cluster trees through sparsity-aware partitioning, automatically determining the optimal hierarchy depth via an interpretable mechanism. Extensive experimental results demonstrate the superiority of our cost function and algorithms in binary clustering performance, hierarchy level identification, and reconstruction accuracy compared to existing approaches.
APA
Pan, Y., Fan, B., Long, P. & Zheng, F.. (2025). An Information-theoretic Perspective of Hierarchical Clustering on Graphs. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:3322-3345 Available from https://proceedings.mlr.press/v286/pan25a.html.

Related Material