Large-Scale Multiway Clustering with Seeded Clustering

Jiaxin Hu
Conference on Parsimony and Learning, PMLR 280:65-88, 2025.

Abstract

Multiway clustering methods for higher-order tensor observations have been developed in various fields, including recommendation systems, neuroimaging, and social networks. However, high computational costs hinder the applications of tensor-based approaches to real-world large-scale data. Here, we propose a large-scale multiway clustering framework under tensor block model, named LS-TBM, with accuracy guarantees. LS-TBM leverages seeded clustering to break down the expensive high-dimensional tensor clustering into two fast low-dimensional steps. Our two-step algorithm substantially reduces the time and space complexities from polynomial to logarithmic rates while maintaining the exact recovery of community structures, under certain signal conditions. We also establish the theoretical phase transition of LS-TBM performance with a key interplay between signal levels and seed sizes. Numerical experiments with synthetic data and real large-scale Uber Pickup data highlight LS-TBM’s superior performance in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v280-hu25a, title = {Large-Scale Multiway Clustering with Seeded Clustering}, author = {Hu, Jiaxin}, booktitle = {Conference on Parsimony and Learning}, pages = {65--88}, year = {2025}, editor = {Chen, Beidi and Liu, Shijia and Pilanci, Mert and Su, Weijie and Sulam, Jeremias and Wang, Yuxiang and Zhu, Zhihui}, volume = {280}, series = {Proceedings of Machine Learning Research}, month = {24--27 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v280/main/assets/hu25a/hu25a.pdf}, url = {https://proceedings.mlr.press/v280/hu25a.html}, abstract = {Multiway clustering methods for higher-order tensor observations have been developed in various fields, including recommendation systems, neuroimaging, and social networks. However, high computational costs hinder the applications of tensor-based approaches to real-world large-scale data. Here, we propose a large-scale multiway clustering framework under tensor block model, named LS-TBM, with accuracy guarantees. LS-TBM leverages seeded clustering to break down the expensive high-dimensional tensor clustering into two fast low-dimensional steps. Our two-step algorithm substantially reduces the time and space complexities from polynomial to logarithmic rates while maintaining the exact recovery of community structures, under certain signal conditions. We also establish the theoretical phase transition of LS-TBM performance with a key interplay between signal levels and seed sizes. Numerical experiments with synthetic data and real large-scale Uber Pickup data highlight LS-TBM’s superior performance in practice.} }
Endnote
%0 Conference Paper %T Large-Scale Multiway Clustering with Seeded Clustering %A Jiaxin Hu %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2025 %E Beidi Chen %E Shijia Liu %E Mert Pilanci %E Weijie Su %E Jeremias Sulam %E Yuxiang Wang %E Zhihui Zhu %F pmlr-v280-hu25a %I PMLR %P 65--88 %U https://proceedings.mlr.press/v280/hu25a.html %V 280 %X Multiway clustering methods for higher-order tensor observations have been developed in various fields, including recommendation systems, neuroimaging, and social networks. However, high computational costs hinder the applications of tensor-based approaches to real-world large-scale data. Here, we propose a large-scale multiway clustering framework under tensor block model, named LS-TBM, with accuracy guarantees. LS-TBM leverages seeded clustering to break down the expensive high-dimensional tensor clustering into two fast low-dimensional steps. Our two-step algorithm substantially reduces the time and space complexities from polynomial to logarithmic rates while maintaining the exact recovery of community structures, under certain signal conditions. We also establish the theoretical phase transition of LS-TBM performance with a key interplay between signal levels and seed sizes. Numerical experiments with synthetic data and real large-scale Uber Pickup data highlight LS-TBM’s superior performance in practice.
APA
Hu, J.. (2025). Large-Scale Multiway Clustering with Seeded Clustering. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 280:65-88 Available from https://proceedings.mlr.press/v280/hu25a.html.

Related Material