Hierarchical Convex NMF for Clustering Massive Data

Kristian Kersting, Mirwaes Wahabzada, Christian Thurau, Christian Bauckhage
Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:253-268, 2010.

Abstract

We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization or Archetypal Analysis. CHNMF factorizes a non-negative data matrix V into two non-negative matrix factors VWH such that the columns of W are convex combinations of certain data points so that they are readily interpretable to data analysts. There is, however, no free lunch: imposing convexity constraints on W typically prevents adaptation to intrinsic, low dimensional structures in the data. Alas, in cases where the data is distributed in a non-convex manner or consists of mixtures of lower dimensional convex distributions, the cluster representatives obtained from CH-NMF will be less meaningful. In this paper, we present a hierarchical CH-NMF that automatically adapts to internal structures of a dataset, hence it yields meaningful and interpretable clusters for non-convex datasets. This is also confirmed by our extensive evaluation on DBLP publication records of 760,000 authors, 4,000,000 images harvested from the web, and 150,000,000 votes on World of Warcraft guilds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v13-kersting10a, title = {Hierarchical Convex NMF for Clustering Massive Data}, author = {Kersting, Kristian and Wahabzada, Mirwaes and Thurau, Christian and Bauckhage, Christian}, booktitle = {Proceedings of 2nd Asian Conference on Machine Learning}, pages = {253--268}, year = {2010}, editor = {Sugiyama, Masashi and Yang, Qiang}, volume = {13}, series = {Proceedings of Machine Learning Research}, address = {Tokyo, Japan}, month = {08--10 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v13/kersting10a/kersting10a.pdf}, url = {https://proceedings.mlr.press/v13/kersting10a.html}, abstract = {We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization or Archetypal Analysis. CHNMF factorizes a non-negative data matrix $V$ into two non-negative matrix factors $V \approx WH$ such that the columns of $W$ are convex combinations of certain data points so that they are readily interpretable to data analysts. There is, however, no free lunch: imposing convexity constraints on W typically prevents adaptation to intrinsic, low dimensional structures in the data. Alas, in cases where the data is distributed in a non-convex manner or consists of mixtures of lower dimensional convex distributions, the cluster representatives obtained from CH-NMF will be less meaningful. In this paper, we present a hierarchical CH-NMF that automatically adapts to internal structures of a dataset, hence it yields meaningful and interpretable clusters for non-convex datasets. This is also confirmed by our extensive evaluation on DBLP publication records of $760,000$ authors, $4,000,000$ images harvested from the web, and $150,000,000$ votes on World of Warcraft guilds.} }
Endnote
%0 Conference Paper %T Hierarchical Convex NMF for Clustering Massive Data %A Kristian Kersting %A Mirwaes Wahabzada %A Christian Thurau %A Christian Bauckhage %B Proceedings of 2nd Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2010 %E Masashi Sugiyama %E Qiang Yang %F pmlr-v13-kersting10a %I PMLR %P 253--268 %U https://proceedings.mlr.press/v13/kersting10a.html %V 13 %X We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization or Archetypal Analysis. CHNMF factorizes a non-negative data matrix $V$ into two non-negative matrix factors $V \approx WH$ such that the columns of $W$ are convex combinations of certain data points so that they are readily interpretable to data analysts. There is, however, no free lunch: imposing convexity constraints on W typically prevents adaptation to intrinsic, low dimensional structures in the data. Alas, in cases where the data is distributed in a non-convex manner or consists of mixtures of lower dimensional convex distributions, the cluster representatives obtained from CH-NMF will be less meaningful. In this paper, we present a hierarchical CH-NMF that automatically adapts to internal structures of a dataset, hence it yields meaningful and interpretable clusters for non-convex datasets. This is also confirmed by our extensive evaluation on DBLP publication records of $760,000$ authors, $4,000,000$ images harvested from the web, and $150,000,000$ votes on World of Warcraft guilds.
RIS
TY - CPAPER TI - Hierarchical Convex NMF for Clustering Massive Data AU - Kristian Kersting AU - Mirwaes Wahabzada AU - Christian Thurau AU - Christian Bauckhage BT - Proceedings of 2nd Asian Conference on Machine Learning DA - 2010/10/31 ED - Masashi Sugiyama ED - Qiang Yang ID - pmlr-v13-kersting10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 13 SP - 253 EP - 268 L1 - http://proceedings.mlr.press/v13/kersting10a/kersting10a.pdf UR - https://proceedings.mlr.press/v13/kersting10a.html AB - We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization or Archetypal Analysis. CHNMF factorizes a non-negative data matrix $V$ into two non-negative matrix factors $V \approx WH$ such that the columns of $W$ are convex combinations of certain data points so that they are readily interpretable to data analysts. There is, however, no free lunch: imposing convexity constraints on W typically prevents adaptation to intrinsic, low dimensional structures in the data. Alas, in cases where the data is distributed in a non-convex manner or consists of mixtures of lower dimensional convex distributions, the cluster representatives obtained from CH-NMF will be less meaningful. In this paper, we present a hierarchical CH-NMF that automatically adapts to internal structures of a dataset, hence it yields meaningful and interpretable clusters for non-convex datasets. This is also confirmed by our extensive evaluation on DBLP publication records of $760,000$ authors, $4,000,000$ images harvested from the web, and $150,000,000$ votes on World of Warcraft guilds. ER -
APA
Kersting, K., Wahabzada, M., Thurau, C. & Bauckhage, C.. (2010). Hierarchical Convex NMF for Clustering Massive Data. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:253-268 Available from https://proceedings.mlr.press/v13/kersting10a.html.

Related Material