Efficient Distributed Topic Modeling with Provable Guarantees

Weicong Ding; Mohammad Rohban; Prakash Ishwar; Venkatesh Saligrama

Efficient Distributed Topic Modeling with Provable Guarantees

Weicong Ding, Mohammad Rohban, Prakash Ishwar, Venkatesh Saligrama

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:167-175, 2014.

Abstract

Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics.

Cite this Paper

BibTeX


@InProceedings{pmlr-v33-ding14a,
  title = 	 {{Efficient Distributed Topic Modeling with Provable Guarantees}},
  author = 	 {Ding, Weicong and Rohban, Mohammad and Ishwar, Prakash and Saligrama, Venkatesh},
  booktitle = 	 {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {167--175},
  year = 	 {2014},
  editor = 	 {Kaski, Samuel and Corander, Jukka},
  volume = 	 {33},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Reykjavik, Iceland},
  month = 	 {22--25 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v33/ding14a.pdf},
  url = 	 {https://proceedings.mlr.press/v33/ding14a.html},
  abstract = 	 {Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics.}
}

Endnote

%0 Conference Paper
%T Efficient Distributed Topic Modeling with Provable Guarantees
%A Weicong Ding
%A Mohammad Rohban
%A Prakash Ishwar
%A Venkatesh Saligrama
%B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2014
%E Samuel Kaski
%E Jukka Corander	
%F pmlr-v33-ding14a
%I PMLR
%P 167--175
%U https://proceedings.mlr.press/v33/ding14a.html
%V 33
%X Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics.

RIS


TY  - CPAPER
TI  - Efficient Distributed Topic Modeling with Provable Guarantees
AU  - Weicong Ding
AU  - Mohammad Rohban
AU  - Prakash Ishwar
AU  - Venkatesh Saligrama
BT  - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
DA  - 2014/04/02
ED  - Samuel Kaski
ED  - Jukka Corander	
ID  - pmlr-v33-ding14a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 33
SP  - 167
EP  - 175
L1  - http://proceedings.mlr.press/v33/ding14a.pdf
UR  - https://proceedings.mlr.press/v33/ding14a.html
AB  - Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics.
ER  -

APA


Ding, W., Rohban, M., Ishwar, P. & Saligrama, V.. (2014). Efficient Distributed Topic Modeling with Provable Guarantees. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:167-175 Available from https://proceedings.mlr.press/v33/ding14a.html.

Efficient Distributed Topic Modeling with Provable Guarantees

Abstract

Cite this Paper

Related Material