Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering

Elise van der Pol, Ian Gemp, Yoram Bachrach
Proceedings of Topological, Algebraic, and Geometric Learning Workshops 2022, PMLR 196:304-311, 2022.

Abstract

Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than 1) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute.

Cite this Paper


BibTeX
@InProceedings{pmlr-v196-pol22a, title = { Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering }, author = {van der Pol, Elise and Gemp, Ian and Bachrach, Yoram}, booktitle = {Proceedings of Topological, Algebraic, and Geometric Learning Workshops 2022}, pages = {304--311}, year = {2022}, editor = {Cloninger, Alexander and Doster, Timothy and Emerson, Tegan and Kaul, Manohar and Ktena, Ira and Kvinge, Henry and Miolane, Nina and Rieck, Bastian and Tymochko, Sarah and Wolf, Guy}, volume = {196}, series = {Proceedings of Machine Learning Research}, month = {25 Feb--22 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v196/pol22a/pol22a.pdf}, url = {https://proceedings.mlr.press/v196/pol22a.html}, abstract = { Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than 1) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute. } }
Endnote
%0 Conference Paper %T Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering %A Elise van der Pol %A Ian Gemp %A Yoram Bachrach %B Proceedings of Topological, Algebraic, and Geometric Learning Workshops 2022 %C Proceedings of Machine Learning Research %D 2022 %E Alexander Cloninger %E Timothy Doster %E Tegan Emerson %E Manohar Kaul %E Ira Ktena %E Henry Kvinge %E Nina Miolane %E Bastian Rieck %E Sarah Tymochko %E Guy Wolf %F pmlr-v196-pol22a %I PMLR %P 304--311 %U https://proceedings.mlr.press/v196/pol22a.html %V 196 %X Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than 1) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute.
APA
van der Pol, E., Gemp, I. & Bachrach, Y.. (2022). Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering . Proceedings of Topological, Algebraic, and Geometric Learning Workshops 2022, in Proceedings of Machine Learning Research 196:304-311 Available from https://proceedings.mlr.press/v196/pol22a.html.

Related Material