Contextual Embedding for Distributed Representations of Entities in a Text Corpus

Md Abdul Kader, Arnold P. Boedihardjo, Sheikh Motahar Naim, M. Shahriar Hossain
Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, PMLR 53:35-50, 2016.

Abstract

Distributed representations of textual elements in low dimensional vector space to capture context has gained great attention recently. Current state-of-the-art word embedding techniques compute distributed representations using co-occurrences of words within a contextual window discounting the flexibility to incorporate other contextual phenomena like temporal, geographical, and topical contexts. In this paper, we present a flexible framework that has the ability to leverage temporal, geographical, and topical information of documents along with the textual content to produce more effective vector representations of entities or words within a document collection. The framework first captures contextual relationships between entities collected from different relevant documents and then leverages these relationships to produce inputs of a graph, or to train a neural network to produce vectors for the entities. Through a set of rigorous experiments we test the performance of our approach and results show that our proposed solution can produce more meaningful vectors than the state-of-the-art methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v53-kader16, title = {Contextual Embedding for Distributed Representations of Entities in a Text Corpus}, author = {Kader, Md Abdul and Boedihardjo, Arnold P. and Naim, Sheikh Motahar and Hossain, M. Shahriar}, booktitle = {Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016}, pages = {35--50}, year = {2016}, editor = {Fan, Wei and Bifet, Albert and Read, Jesse and Yang, Qiang and Yu, Philip S.}, volume = {53}, series = {Proceedings of Machine Learning Research}, address = {San Francisco, California, USA}, month = {14 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v53/kader16.pdf}, url = {https://proceedings.mlr.press/v53/kader16.html}, abstract = {Distributed representations of textual elements in low dimensional vector space to capture context has gained great attention recently. Current state-of-the-art word embedding techniques compute distributed representations using co-occurrences of words within a contextual window discounting the flexibility to incorporate other contextual phenomena like temporal, geographical, and topical contexts. In this paper, we present a flexible framework that has the ability to leverage temporal, geographical, and topical information of documents along with the textual content to produce more effective vector representations of entities or words within a document collection. The framework first captures contextual relationships between entities collected from different relevant documents and then leverages these relationships to produce inputs of a graph, or to train a neural network to produce vectors for the entities. Through a set of rigorous experiments we test the performance of our approach and results show that our proposed solution can produce more meaningful vectors than the state-of-the-art methods.} }
Endnote
%0 Conference Paper %T Contextual Embedding for Distributed Representations of Entities in a Text Corpus %A Md Abdul Kader %A Arnold P. Boedihardjo %A Sheikh Motahar Naim %A M. Shahriar Hossain %B Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016 %C Proceedings of Machine Learning Research %D 2016 %E Wei Fan %E Albert Bifet %E Jesse Read %E Qiang Yang %E Philip S. Yu %F pmlr-v53-kader16 %I PMLR %P 35--50 %U https://proceedings.mlr.press/v53/kader16.html %V 53 %X Distributed representations of textual elements in low dimensional vector space to capture context has gained great attention recently. Current state-of-the-art word embedding techniques compute distributed representations using co-occurrences of words within a contextual window discounting the flexibility to incorporate other contextual phenomena like temporal, geographical, and topical contexts. In this paper, we present a flexible framework that has the ability to leverage temporal, geographical, and topical information of documents along with the textual content to produce more effective vector representations of entities or words within a document collection. The framework first captures contextual relationships between entities collected from different relevant documents and then leverages these relationships to produce inputs of a graph, or to train a neural network to produce vectors for the entities. Through a set of rigorous experiments we test the performance of our approach and results show that our proposed solution can produce more meaningful vectors than the state-of-the-art methods.
RIS
TY - CPAPER TI - Contextual Embedding for Distributed Representations of Entities in a Text Corpus AU - Md Abdul Kader AU - Arnold P. Boedihardjo AU - Sheikh Motahar Naim AU - M. Shahriar Hossain BT - Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016 DA - 2016/12/06 ED - Wei Fan ED - Albert Bifet ED - Jesse Read ED - Qiang Yang ED - Philip S. Yu ID - pmlr-v53-kader16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 53 SP - 35 EP - 50 L1 - http://proceedings.mlr.press/v53/kader16.pdf UR - https://proceedings.mlr.press/v53/kader16.html AB - Distributed representations of textual elements in low dimensional vector space to capture context has gained great attention recently. Current state-of-the-art word embedding techniques compute distributed representations using co-occurrences of words within a contextual window discounting the flexibility to incorporate other contextual phenomena like temporal, geographical, and topical contexts. In this paper, we present a flexible framework that has the ability to leverage temporal, geographical, and topical information of documents along with the textual content to produce more effective vector representations of entities or words within a document collection. The framework first captures contextual relationships between entities collected from different relevant documents and then leverages these relationships to produce inputs of a graph, or to train a neural network to produce vectors for the entities. Through a set of rigorous experiments we test the performance of our approach and results show that our proposed solution can produce more meaningful vectors than the state-of-the-art methods. ER -
APA
Kader, M.A., Boedihardjo, A.P., Naim, S.M. & Hossain, M.S.. (2016). Contextual Embedding for Distributed Representations of Entities in a Text Corpus. Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, in Proceedings of Machine Learning Research 53:35-50 Available from https://proceedings.mlr.press/v53/kader16.html.

Related Material