Topic Discovery through Data Dependent and Random Projections

Weicong Ding; Mohammad Hossein Rohban; Prakash Ishwar; Venkatesh Saligrama

Topic Discovery through Data Dependent and Random Projections

Weicong Ding, Mohammad Hossein Rohban, Prakash Ishwar, Venkatesh Saligrama

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1202-1210, 2013.

Abstract

We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-ding13,
  title = 	 {Topic Discovery through Data Dependent and Random Projections},
  author = 	 {Ding, Weicong and Hossein Rohban, Mohammad and Ishwar, Prakash and Saligrama, Venkatesh},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {1202--1210},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/ding13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/ding13.html},
  abstract = 	 {We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme.}
}

Endnote

%0 Conference Paper
%T Topic Discovery through Data Dependent and Random Projections
%A Weicong Ding
%A Mohammad Hossein Rohban
%A Prakash Ishwar
%A Venkatesh Saligrama
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-ding13
%I PMLR
%P 1202--1210
%U https://proceedings.mlr.press/v28/ding13.html
%V 28
%N 3
%X We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme.

RIS


TY  - CPAPER
TI  - Topic Discovery through Data Dependent and Random Projections
AU  - Weicong Ding
AU  - Mohammad Hossein Rohban
AU  - Prakash Ishwar
AU  - Venkatesh Saligrama
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-ding13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 1202
EP  - 1210
L1  - http://proceedings.mlr.press/v28/ding13.pdf
UR  - https://proceedings.mlr.press/v28/ding13.html
AB  - We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme.
ER  -

APA


Ding, W., Hossein Rohban, M., Ishwar, P. & Saligrama, V.. (2013). Topic Discovery through Data Dependent and Random Projections. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1202-1210 Available from https://proceedings.mlr.press/v28/ding13.html.

Topic Discovery through Data Dependent and Random Projections

Abstract

Cite this Paper

Related Material