Topic Discovery through Data Dependent and Random Projections

Weicong Ding, Mohammad Hossein Rohban, Prakash Ishwar, Venkatesh Saligrama
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1202-1210, 2013.

Abstract

We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-ding13, title = {Topic Discovery through Data Dependent and Random Projections}, author = {Ding, Weicong and Hossein Rohban, Mohammad and Ishwar, Prakash and Saligrama, Venkatesh}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {1202--1210}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/ding13.pdf}, url = {https://proceedings.mlr.press/v28/ding13.html}, abstract = {We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme.} }
Endnote
%0 Conference Paper %T Topic Discovery through Data Dependent and Random Projections %A Weicong Ding %A Mohammad Hossein Rohban %A Prakash Ishwar %A Venkatesh Saligrama %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-ding13 %I PMLR %P 1202--1210 %U https://proceedings.mlr.press/v28/ding13.html %V 28 %N 3 %X We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme.
RIS
TY - CPAPER TI - Topic Discovery through Data Dependent and Random Projections AU - Weicong Ding AU - Mohammad Hossein Rohban AU - Prakash Ishwar AU - Venkatesh Saligrama BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-ding13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 1202 EP - 1210 L1 - http://proceedings.mlr.press/v28/ding13.pdf UR - https://proceedings.mlr.press/v28/ding13.html AB - We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme. ER -
APA
Ding, W., Hossein Rohban, M., Ishwar, P. & Saligrama, V.. (2013). Topic Discovery through Data Dependent and Random Projections. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1202-1210 Available from https://proceedings.mlr.press/v28/ding13.html.

Related Material