Contextual Embedding for Distributed Representations of Entities in a Text Corpus

Md Abdul Kader; Arnold P. Boedihardjo; Sheikh Motahar Naim; M. Shahriar Hossain

Contextual Embedding for Distributed Representations of Entities in a Text Corpus

Md Abdul Kader, Arnold P. Boedihardjo, Sheikh Motahar Naim, M. Shahriar Hossain

Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, PMLR 53:35-50, 2016.

Abstract

Distributed representations of textual elements in low dimensional vector space to capture context has gained great attention recently. Current state-of-the-art word embedding techniques compute distributed representations using co-occurrences of words within a contextual window discounting the flexibility to incorporate other contextual phenomena like temporal, geographical, and topical contexts. In this paper, we present a flexible framework that has the ability to leverage temporal, geographical, and topical information of documents along with the textual content to produce more effective vector representations of entities or words within a document collection. The framework first captures contextual relationships between entities collected from different relevant documents and then leverages these relationships to produce inputs of a graph, or to train a neural network to produce vectors for the entities. Through a set of rigorous experiments we test the performance of our approach and results show that our proposed solution can produce more meaningful vectors than the state-of-the-art methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v53-kader16,
  title = 	 {Contextual Embedding for Distributed Representations of Entities in a Text Corpus},
  author = 	 {Kader, Md Abdul and Boedihardjo, Arnold P. and Naim, Sheikh Motahar and Hossain, M. Shahriar},
  booktitle = 	 {Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016},
  pages = 	 {35--50},
  year = 	 {2016},
  editor = 	 {Fan, Wei and Bifet, Albert and Read, Jesse and Yang, Qiang and Yu, Philip S.},
  volume = 	 {53},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Francisco, California, USA},
  month = 	 {14 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v53/kader16.pdf},
  url = 	 {https://proceedings.mlr.press/v53/kader16.html},
  abstract = 	 {Distributed representations of textual elements in low dimensional vector space to  capture context has gained great attention recently. Current state-of-the-art word embedding techniques compute distributed representations using co-occurrences of words within a  contextual window discounting the flexibility to incorporate other contextual phenomena like temporal, geographical, and topical contexts. In this paper, we present a flexible framework that has the ability to leverage temporal, geographical, and topical information of documents  along with the textual content to produce more effective vector representations of entities or words within a document collection. The framework first captures contextual  relationships between entities collected from different relevant documents and then leverages these relationships to produce inputs of a graph, or to train a neural network to produce vectors for the entities. Through a set of rigorous experiments we test the performance of our approach and results show that our proposed solution can produce more meaningful vectors than the state-of-the-art methods.}
}

Endnote

%0 Conference Paper
%T Contextual Embedding for Distributed Representations of Entities in a Text Corpus
%A Md Abdul Kader
%A Arnold P. Boedihardjo
%A Sheikh Motahar Naim
%A M. Shahriar Hossain
%B Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016
%C Proceedings of Machine Learning Research
%D 2016
%E Wei Fan
%E Albert Bifet
%E Jesse Read
%E Qiang Yang
%E Philip S. Yu	
%F pmlr-v53-kader16
%I PMLR
%P 35--50
%U https://proceedings.mlr.press/v53/kader16.html
%V 53
%X Distributed representations of textual elements in low dimensional vector space to  capture context has gained great attention recently. Current state-of-the-art word embedding techniques compute distributed representations using co-occurrences of words within a  contextual window discounting the flexibility to incorporate other contextual phenomena like temporal, geographical, and topical contexts. In this paper, we present a flexible framework that has the ability to leverage temporal, geographical, and topical information of documents  along with the textual content to produce more effective vector representations of entities or words within a document collection. The framework first captures contextual  relationships between entities collected from different relevant documents and then leverages these relationships to produce inputs of a graph, or to train a neural network to produce vectors for the entities. Through a set of rigorous experiments we test the performance of our approach and results show that our proposed solution can produce more meaningful vectors than the state-of-the-art methods.

RIS


TY  - CPAPER
TI  - Contextual Embedding for Distributed Representations of Entities in a Text Corpus
AU  - Md Abdul Kader
AU  - Arnold P. Boedihardjo
AU  - Sheikh Motahar Naim
AU  - M. Shahriar Hossain
BT  - Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016
DA  - 2016/12/06
ED  - Wei Fan
ED  - Albert Bifet
ED  - Jesse Read
ED  - Qiang Yang
ED  - Philip S. Yu	
ID  - pmlr-v53-kader16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 53
SP  - 35
EP  - 50
L1  - http://proceedings.mlr.press/v53/kader16.pdf
UR  - https://proceedings.mlr.press/v53/kader16.html
AB  - Distributed representations of textual elements in low dimensional vector space to  capture context has gained great attention recently. Current state-of-the-art word embedding techniques compute distributed representations using co-occurrences of words within a  contextual window discounting the flexibility to incorporate other contextual phenomena like temporal, geographical, and topical contexts. In this paper, we present a flexible framework that has the ability to leverage temporal, geographical, and topical information of documents  along with the textual content to produce more effective vector representations of entities or words within a document collection. The framework first captures contextual  relationships between entities collected from different relevant documents and then leverages these relationships to produce inputs of a graph, or to train a neural network to produce vectors for the entities. Through a set of rigorous experiments we test the performance of our approach and results show that our proposed solution can produce more meaningful vectors than the state-of-the-art methods.
ER  -

APA


Kader, M.A., Boedihardjo, A.P., Naim, S.M. & Hossain, M.S.. (2016). Contextual Embedding for Distributed Representations of Entities in a Text Corpus. Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, in Proceedings of Machine Learning Research 53:35-50 Available from https://proceedings.mlr.press/v53/kader16.html.

Related Material

Download PDF