A Unifying Theorem for Spectral Embedding and Clustering

Matthew Brand, Kun Huang
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, PMLR R4:41-48, 2003.

Abstract

Spectral methods use selected eigenvectors of a data affinity matrix to obtain a data representation that can be trivially clustered or embedded in a low-dimensional space. We present a theorem that explains, for broad classes of affinity matrices and eigenbases, why this works: For successively smaller eigenbases (i.e., using fewer and fewer of the affinity matrix’s dominant eigenvalues and eigenvectors), the angles between "similar" vectors in the new representation shrink while the angles between "dissimilar" vectors grow. Specifically, the sum of the squared cosines of the angles is strictly increasing as the dimensionality of the representation decreases. Thus spectral methods work because the truncated eigenbasis amplifies structure in the data so that any heuristic post-processing is more likely to succeed. We use this result to construct a nonlinear dimensionality reduction (NLDR) algorithm for data sampled from manifolds whose intrinsic coordinate system has linear and cyclic axes, and a novel clustering-by-projections algorithm that requires no post-processing and gives superior performance on "challenge problems" from the recent literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR4-brand03a, title = {A Unifying Theorem for Spectral Embedding and Clustering}, author = {Brand, Matthew and Huang, Kun}, booktitle = {Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics}, pages = {41--48}, year = {2003}, editor = {Bishop, Christopher M. and Frey, Brendan J.}, volume = {R4}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r4/brand03a/brand03a.pdf}, url = {https://proceedings.mlr.press/r4/brand03a.html}, abstract = {Spectral methods use selected eigenvectors of a data affinity matrix to obtain a data representation that can be trivially clustered or embedded in a low-dimensional space. We present a theorem that explains, for broad classes of affinity matrices and eigenbases, why this works: For successively smaller eigenbases (i.e., using fewer and fewer of the affinity matrix’s dominant eigenvalues and eigenvectors), the angles between "similar" vectors in the new representation shrink while the angles between "dissimilar" vectors grow. Specifically, the sum of the squared cosines of the angles is strictly increasing as the dimensionality of the representation decreases. Thus spectral methods work because the truncated eigenbasis amplifies structure in the data so that any heuristic post-processing is more likely to succeed. We use this result to construct a nonlinear dimensionality reduction (NLDR) algorithm for data sampled from manifolds whose intrinsic coordinate system has linear and cyclic axes, and a novel clustering-by-projections algorithm that requires no post-processing and gives superior performance on "challenge problems" from the recent literature.}, note = {Reissued by PMLR on 01 April 2021.} }
Endnote
%0 Conference Paper %T A Unifying Theorem for Spectral Embedding and Clustering %A Matthew Brand %A Kun Huang %B Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2003 %E Christopher M. Bishop %E Brendan J. Frey %F pmlr-vR4-brand03a %I PMLR %P 41--48 %U https://proceedings.mlr.press/r4/brand03a.html %V R4 %X Spectral methods use selected eigenvectors of a data affinity matrix to obtain a data representation that can be trivially clustered or embedded in a low-dimensional space. We present a theorem that explains, for broad classes of affinity matrices and eigenbases, why this works: For successively smaller eigenbases (i.e., using fewer and fewer of the affinity matrix’s dominant eigenvalues and eigenvectors), the angles between "similar" vectors in the new representation shrink while the angles between "dissimilar" vectors grow. Specifically, the sum of the squared cosines of the angles is strictly increasing as the dimensionality of the representation decreases. Thus spectral methods work because the truncated eigenbasis amplifies structure in the data so that any heuristic post-processing is more likely to succeed. We use this result to construct a nonlinear dimensionality reduction (NLDR) algorithm for data sampled from manifolds whose intrinsic coordinate system has linear and cyclic axes, and a novel clustering-by-projections algorithm that requires no post-processing and gives superior performance on "challenge problems" from the recent literature. %Z Reissued by PMLR on 01 April 2021.
APA
Brand, M. & Huang, K.. (2003). A Unifying Theorem for Spectral Embedding and Clustering. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R4:41-48 Available from https://proceedings.mlr.press/r4/brand03a.html. Reissued by PMLR on 01 April 2021.

Related Material