Is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction?

Wray L. Buntine, Sami Perttu
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, PMLR R4:57-64, 2003.

Abstract

Discrete analogues to Principal Components Analysis (PCA) are intended to handle discrete or positive-only data, for instance sets of documents. The class of methods is appropriately called multinomial PCA because it replaces the Gaussian in the probabilistic formulation of PCA with a multinomial. Experiments to date, however, have been on small data sets, for instance, from early information retrieval collections. This paper demonstrates the method on two large data sets and considers two extremes of behaviour: (1) dimensionality reduction where the feature set (i.e., bag of words) is considerably reduced, and (2) multi-faceted clustering (or aspect modelling) where clustering is done but items can now belong in several clusters at once.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR4-buntine03a, title = {Is Multinomial {PCA} Multi-faceted Clustering or Dimensionality Reduction?}, author = {Buntine, Wray L. and Perttu, Sami}, booktitle = {Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics}, pages = {57--64}, year = {2003}, editor = {Bishop, Christopher M. and Frey, Brendan J.}, volume = {R4}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r4/buntine03a/buntine03a.pdf}, url = {https://proceedings.mlr.press/r4/buntine03a.html}, abstract = {Discrete analogues to Principal Components Analysis (PCA) are intended to handle discrete or positive-only data, for instance sets of documents. The class of methods is appropriately called multinomial PCA because it replaces the Gaussian in the probabilistic formulation of PCA with a multinomial. Experiments to date, however, have been on small data sets, for instance, from early information retrieval collections. This paper demonstrates the method on two large data sets and considers two extremes of behaviour: (1) dimensionality reduction where the feature set (i.e., bag of words) is considerably reduced, and (2) multi-faceted clustering (or aspect modelling) where clustering is done but items can now belong in several clusters at once.}, note = {Reissued by PMLR on 01 April 2021.} }
Endnote
%0 Conference Paper %T Is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction? %A Wray L. Buntine %A Sami Perttu %B Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2003 %E Christopher M. Bishop %E Brendan J. Frey %F pmlr-vR4-buntine03a %I PMLR %P 57--64 %U https://proceedings.mlr.press/r4/buntine03a.html %V R4 %X Discrete analogues to Principal Components Analysis (PCA) are intended to handle discrete or positive-only data, for instance sets of documents. The class of methods is appropriately called multinomial PCA because it replaces the Gaussian in the probabilistic formulation of PCA with a multinomial. Experiments to date, however, have been on small data sets, for instance, from early information retrieval collections. This paper demonstrates the method on two large data sets and considers two extremes of behaviour: (1) dimensionality reduction where the feature set (i.e., bag of words) is considerably reduced, and (2) multi-faceted clustering (or aspect modelling) where clustering is done but items can now belong in several clusters at once. %Z Reissued by PMLR on 01 April 2021.
APA
Buntine, W.L. & Perttu, S.. (2003). Is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction?. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R4:57-64 Available from https://proceedings.mlr.press/r4/buntine03a.html. Reissued by PMLR on 01 April 2021.

Related Material