Hierarchical Convex NMF for Clustering Massive Data

Kristian Kersting, Mirwaes Wahabzada, Christian Thurau, Christian Bauckhage
Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:253-268, 2010.

Abstract

We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization or Archetypal Analysis. CHNMF factorizes a non-negative data matrix $V$ into two non-negative matrix factors $V \approx WH$ such that the columns of $W$ are convex combinations of certain data points so that they are readily interpretable to data analysts. There is, however, no free lunch: imposing convexity constraints on W typically prevents adaptation to intrinsic, low dimensional structures in the data. Alas, in cases where the data is distributed in a non-convex manner or consists of mixtures of lower dimensional convex distributions, the cluster representatives obtained from CH-NMF will be less meaningful. In this paper, we present a hierarchical CH-NMF that automatically adapts to internal structures of a dataset, hence it yields meaningful and interpretable clusters for non-convex datasets. This is also confirmed by our extensive evaluation on DBLP publication records of $760,000$ authors, $4,000,000$ images harvested from the web, and $150,000,000$ votes on World of Warcraft guilds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v13-kersting10a, title = {Hierarchical Convex NMF for Clustering Massive Data}, author = {Kersting, Kristian and Wahabzada, Mirwaes and Thurau, Christian and Bauckhage, Christian}, booktitle = {Proceedings of 2nd Asian Conference on Machine Learning}, pages = {253--268}, year = {2010}, editor = {Sugiyama, Masashi and Yang, Qiang}, volume = {13}, series = {Proceedings of Machine Learning Research}, address = {Tokyo, Japan}, month = {08--10 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v13/kersting10a/kersting10a.pdf}, url = {https://proceedings.mlr.press/v13/kersting10a.html}, abstract = {We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization or Archetypal Analysis. CHNMF factorizes a non-negative data matrix $V$ into two non-negative matrix factors $V \approx WH$ such that the columns of $W$ are convex combinations of certain data points so that they are readily interpretable to data analysts. There is, however, no free lunch: imposing convexity constraints on W typically prevents adaptation to intrinsic, low dimensional structures in the data. Alas, in cases where the data is distributed in a non-convex manner or consists of mixtures of lower dimensional convex distributions, the cluster representatives obtained from CH-NMF will be less meaningful. In this paper, we present a hierarchical CH-NMF that automatically adapts to internal structures of a dataset, hence it yields meaningful and interpretable clusters for non-convex datasets. This is also confirmed by our extensive evaluation on DBLP publication records of $760,000$ authors, $4,000,000$ images harvested from the web, and $150,000,000$ votes on World of Warcraft guilds.} }
Endnote
%0 Conference Paper %T Hierarchical Convex NMF for Clustering Massive Data %A Kristian Kersting %A Mirwaes Wahabzada %A Christian Thurau %A Christian Bauckhage %B Proceedings of 2nd Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2010 %E Masashi Sugiyama %E Qiang Yang %F pmlr-v13-kersting10a %I PMLR %P 253--268 %U https://proceedings.mlr.press/v13/kersting10a.html %V 13 %X We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization or Archetypal Analysis. CHNMF factorizes a non-negative data matrix $V$ into two non-negative matrix factors $V \approx WH$ such that the columns of $W$ are convex combinations of certain data points so that they are readily interpretable to data analysts. There is, however, no free lunch: imposing convexity constraints on W typically prevents adaptation to intrinsic, low dimensional structures in the data. Alas, in cases where the data is distributed in a non-convex manner or consists of mixtures of lower dimensional convex distributions, the cluster representatives obtained from CH-NMF will be less meaningful. In this paper, we present a hierarchical CH-NMF that automatically adapts to internal structures of a dataset, hence it yields meaningful and interpretable clusters for non-convex datasets. This is also confirmed by our extensive evaluation on DBLP publication records of $760,000$ authors, $4,000,000$ images harvested from the web, and $150,000,000$ votes on World of Warcraft guilds.
RIS
TY - CPAPER TI - Hierarchical Convex NMF for Clustering Massive Data AU - Kristian Kersting AU - Mirwaes Wahabzada AU - Christian Thurau AU - Christian Bauckhage BT - Proceedings of 2nd Asian Conference on Machine Learning DA - 2010/10/31 ED - Masashi Sugiyama ED - Qiang Yang ID - pmlr-v13-kersting10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 13 SP - 253 EP - 268 L1 - http://proceedings.mlr.press/v13/kersting10a/kersting10a.pdf UR - https://proceedings.mlr.press/v13/kersting10a.html AB - We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization or Archetypal Analysis. CHNMF factorizes a non-negative data matrix $V$ into two non-negative matrix factors $V \approx WH$ such that the columns of $W$ are convex combinations of certain data points so that they are readily interpretable to data analysts. There is, however, no free lunch: imposing convexity constraints on W typically prevents adaptation to intrinsic, low dimensional structures in the data. Alas, in cases where the data is distributed in a non-convex manner or consists of mixtures of lower dimensional convex distributions, the cluster representatives obtained from CH-NMF will be less meaningful. In this paper, we present a hierarchical CH-NMF that automatically adapts to internal structures of a dataset, hence it yields meaningful and interpretable clusters for non-convex datasets. This is also confirmed by our extensive evaluation on DBLP publication records of $760,000$ authors, $4,000,000$ images harvested from the web, and $150,000,000$ votes on World of Warcraft guilds. ER -
APA
Kersting, K., Wahabzada, M., Thurau, C. & Bauckhage, C.. (2010). Hierarchical Convex NMF for Clustering Massive Data. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:253-268 Available from https://proceedings.mlr.press/v13/kersting10a.html.

Related Material