Efficient Distributed Topic Modeling with Provable Guarantees

Weicong Ding, Mohammad Rohban, Prakash Ishwar, Venkatesh Saligrama
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:167-175, 2014.

Abstract

Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v33-ding14a, title = {{Efficient Distributed Topic Modeling with Provable Guarantees}}, author = {Weicong Ding and Mohammad Rohban and Prakash Ishwar and Venkatesh Saligrama}, booktitle = {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics}, pages = {167--175}, year = {2014}, editor = {Samuel Kaski and Jukka Corander}, volume = {33}, series = {Proceedings of Machine Learning Research}, address = {Reykjavik, Iceland}, month = {22--25 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v33/ding14a.pdf}, url = {http://proceedings.mlr.press/v33/ding14a.html}, abstract = {Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics.} }
Endnote
%0 Conference Paper %T Efficient Distributed Topic Modeling with Provable Guarantees %A Weicong Ding %A Mohammad Rohban %A Prakash Ishwar %A Venkatesh Saligrama %B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2014 %E Samuel Kaski %E Jukka Corander %F pmlr-v33-ding14a %I PMLR %P 167--175 %U http://proceedings.mlr.press/v33/ding14a.html %V 33 %X Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics.
RIS
TY - CPAPER TI - Efficient Distributed Topic Modeling with Provable Guarantees AU - Weicong Ding AU - Mohammad Rohban AU - Prakash Ishwar AU - Venkatesh Saligrama BT - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics DA - 2014/04/02 ED - Samuel Kaski ED - Jukka Corander ID - pmlr-v33-ding14a PB - PMLR DP - Proceedings of Machine Learning Research VL - 33 SP - 167 EP - 175 L1 - http://proceedings.mlr.press/v33/ding14a.pdf UR - http://proceedings.mlr.press/v33/ding14a.html AB - Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics. ER -
APA
Ding, W., Rohban, M., Ishwar, P. & Saligrama, V.. (2014). Efficient Distributed Topic Modeling with Provable Guarantees. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:167-175 Available from http://proceedings.mlr.press/v33/ding14a.html.

Related Material