Hierarchical Overlapping Clustering on Graphs: Cost Function, Algorithm and Scalability

Yicheng Pan, Renjie Chen, Pengyu Long, Bingchen Fan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:47565-47589, 2025.

Abstract

Hierarchical and overlapping clustering are two prevalent phenomena that often coexist in real-world system. While numerous studies have examined these two structures separately, characterizing and evaluating their hybrid forms remains an open challenge. To bridge this gap, we initiate the study of hierarchical overlapping clustering on graphs by introducing a new cost function and establishing its rationality through several intuitive properties. We further develop an approximation algorithm that achieves a constant approximation factor for its dual version. Our approach employs a recursive overlapping bipartition framework based on local search, enabling a highly scalable speed-up variant. Experimental results demonstrate that this speed-up algorithm outperforms all baseline methods significantly in both effectiveness (across synthetic and real datasets) and scalability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-pan25a, title = {Hierarchical Overlapping Clustering on Graphs: Cost Function, Algorithm and Scalability}, author = {Pan, Yicheng and Chen, Renjie and Long, Pengyu and Fan, Bingchen}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {47565--47589}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/pan25a/pan25a.pdf}, url = {https://proceedings.mlr.press/v267/pan25a.html}, abstract = {Hierarchical and overlapping clustering are two prevalent phenomena that often coexist in real-world system. While numerous studies have examined these two structures separately, characterizing and evaluating their hybrid forms remains an open challenge. To bridge this gap, we initiate the study of hierarchical overlapping clustering on graphs by introducing a new cost function and establishing its rationality through several intuitive properties. We further develop an approximation algorithm that achieves a constant approximation factor for its dual version. Our approach employs a recursive overlapping bipartition framework based on local search, enabling a highly scalable speed-up variant. Experimental results demonstrate that this speed-up algorithm outperforms all baseline methods significantly in both effectiveness (across synthetic and real datasets) and scalability.} }
Endnote
%0 Conference Paper %T Hierarchical Overlapping Clustering on Graphs: Cost Function, Algorithm and Scalability %A Yicheng Pan %A Renjie Chen %A Pengyu Long %A Bingchen Fan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-pan25a %I PMLR %P 47565--47589 %U https://proceedings.mlr.press/v267/pan25a.html %V 267 %X Hierarchical and overlapping clustering are two prevalent phenomena that often coexist in real-world system. While numerous studies have examined these two structures separately, characterizing and evaluating their hybrid forms remains an open challenge. To bridge this gap, we initiate the study of hierarchical overlapping clustering on graphs by introducing a new cost function and establishing its rationality through several intuitive properties. We further develop an approximation algorithm that achieves a constant approximation factor for its dual version. Our approach employs a recursive overlapping bipartition framework based on local search, enabling a highly scalable speed-up variant. Experimental results demonstrate that this speed-up algorithm outperforms all baseline methods significantly in both effectiveness (across synthetic and real datasets) and scalability.
APA
Pan, Y., Chen, R., Long, P. & Fan, B.. (2025). Hierarchical Overlapping Clustering on Graphs: Cost Function, Algorithm and Scalability. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:47565-47589 Available from https://proceedings.mlr.press/v267/pan25a.html.

Related Material