Capacity Releasing Diffusion for Speed and Locality

Di Wang, Kimon Fountoulakis, Monika Henzinger, Michael W. Mahoney, Satish Rao
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3598-3607, 2017.

Abstract

Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass “too aggressively,” thereby failing to find the “right” clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an $O(\log^2 n)$ factor, where $n$ is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good—but not very good—clusters.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-wang17b, title = {Capacity Releasing Diffusion for Speed and Locality}, author = {Di Wang and Kimon Fountoulakis and Monika Henzinger and Michael W. Mahoney and Satish Rao}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {3598--3607}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/wang17b/wang17b.pdf}, url = {https://proceedings.mlr.press/v70/wang17b.html}, abstract = {Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass “too aggressively,” thereby failing to find the “right” clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an $O(\log^2 n)$ factor, where $n$ is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good—but not very good—clusters.} }
Endnote
%0 Conference Paper %T Capacity Releasing Diffusion for Speed and Locality %A Di Wang %A Kimon Fountoulakis %A Monika Henzinger %A Michael W. Mahoney %A Satish Rao %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-wang17b %I PMLR %P 3598--3607 %U https://proceedings.mlr.press/v70/wang17b.html %V 70 %X Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass “too aggressively,” thereby failing to find the “right” clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an $O(\log^2 n)$ factor, where $n$ is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good—but not very good—clusters.
APA
Wang, D., Fountoulakis, K., Henzinger, M., Mahoney, M.W. & Rao, S.. (2017). Capacity Releasing Diffusion for Speed and Locality. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3598-3607 Available from https://proceedings.mlr.press/v70/wang17b.html.

Related Material