A Latent Space Approach to Dynamic Embedding of Co-occurrence Data

Purnamrita Sarkar; Sajid M. Siddiqi; Geogrey J. Gordon

A Latent Space Approach to Dynamic Embedding of Co-occurrence Data

Purnamrita Sarkar, Sajid M. Siddiqi, Geogrey J. Gordon

Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:420-427, 2007.

Abstract

We consider dynamic co-occurrence data, such as author-word links in papers published in successive years of the same conference. For static co-occurrence data, researchers often seek an embedding of the entities (authors and words) into a low-dimensional Euclidean space. We generalize a recent static co-occurrence model, the CODE model of Globerson et al. (2004), to the dynamic setting: we seek coordinates for each entity at each time step. The coordinates can change with time to explain new observations, but since large changes are improbable, we can exploit data at previous and subsequent steps to find a better explanation for current observations. To make inference tractable, we show how to approximate our observation model with a Gaussian distribution, allowing the use of a Kalman filter for tractable inference. The result is the first algorithm for dynamic embedding of co-occurrence data which provides distributional information for its coordinate estimates. We demonstrate our model both on synthetic data and on author-word data from the NIPS corpus, showing that it produces intuitively reasonable embeddings. We also provide evidence for the usefulness of our model by its performance on an author-prediction task.

Cite this Paper

BibTeX

@InProceedings{pmlr-v2-sarkar07a,
  title = 	 {A Latent Space Approach to Dynamic Embedding of Co-occurrence Data},
  author = 	 {Sarkar, Purnamrita and Siddiqi, Sajid M. and Gordon, Geogrey J.},
  booktitle = 	 {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics},
  pages = 	 {420--427},
  year = 	 {2007},
  editor = 	 {Meila, Marina and Shen, Xiaotong},
  volume = 	 {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Juan, Puerto Rico},
  month = 	 {21--24 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v2/sarkar07a/sarkar07a.pdf},
  url = 	 {https://proceedings.mlr.press/v2/sarkar07a.html},
  abstract = 	 {We consider dynamic co-occurrence data, such as author-word links in papers published in successive years of the same conference. For static co-occurrence data, researchers often seek an embedding of the entities (authors and words) into a low-dimensional Euclidean space. We generalize a recent static co-occurrence model, the CODE model of Globerson et al. (2004), to the dynamic setting: we seek coordinates for each entity at each time step. The coordinates can change with time to explain new observations, but since large changes are improbable, we can exploit data at previous and subsequent steps to find a better explanation for current observations. To make inference tractable, we show how to approximate our observation model with a Gaussian distribution, allowing the use of a Kalman filter for tractable inference. The result is the first algorithm for dynamic embedding of co-occurrence data which provides distributional information for its coordinate estimates. We demonstrate our model both on synthetic data and on author-word data from the NIPS corpus, showing that it produces intuitively reasonable embeddings. We also provide evidence for the usefulness of our model by its performance on an author-prediction task.}
}

Endnote

%0 Conference Paper
%T A Latent Space Approach to Dynamic Embedding of Co-occurrence Data
%A Purnamrita Sarkar
%A Sajid M. Siddiqi
%A Geogrey J. Gordon
%B Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2007
%E Marina Meila
%E Xiaotong Shen	
%F pmlr-v2-sarkar07a
%I PMLR
%P 420--427
%U https://proceedings.mlr.press/v2/sarkar07a.html
%V 2
%X We consider dynamic co-occurrence data, such as author-word links in papers published in successive years of the same conference. For static co-occurrence data, researchers often seek an embedding of the entities (authors and words) into a low-dimensional Euclidean space. We generalize a recent static co-occurrence model, the CODE model of Globerson et al. (2004), to the dynamic setting: we seek coordinates for each entity at each time step. The coordinates can change with time to explain new observations, but since large changes are improbable, we can exploit data at previous and subsequent steps to find a better explanation for current observations. To make inference tractable, we show how to approximate our observation model with a Gaussian distribution, allowing the use of a Kalman filter for tractable inference. The result is the first algorithm for dynamic embedding of co-occurrence data which provides distributional information for its coordinate estimates. We demonstrate our model both on synthetic data and on author-word data from the NIPS corpus, showing that it produces intuitively reasonable embeddings. We also provide evidence for the usefulness of our model by its performance on an author-prediction task.

RIS

TY  - CPAPER
TI  - A Latent Space Approach to Dynamic Embedding of Co-occurrence Data
AU  - Purnamrita Sarkar
AU  - Sajid M. Siddiqi
AU  - Geogrey J. Gordon
BT  - Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
DA  - 2007/03/11
ED  - Marina Meila
ED  - Xiaotong Shen	
ID  - pmlr-v2-sarkar07a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 2
SP  - 420
EP  - 427
L1  - http://proceedings.mlr.press/v2/sarkar07a/sarkar07a.pdf
UR  - https://proceedings.mlr.press/v2/sarkar07a.html
AB  - We consider dynamic co-occurrence data, such as author-word links in papers published in successive years of the same conference. For static co-occurrence data, researchers often seek an embedding of the entities (authors and words) into a low-dimensional Euclidean space. We generalize a recent static co-occurrence model, the CODE model of Globerson et al. (2004), to the dynamic setting: we seek coordinates for each entity at each time step. The coordinates can change with time to explain new observations, but since large changes are improbable, we can exploit data at previous and subsequent steps to find a better explanation for current observations. To make inference tractable, we show how to approximate our observation model with a Gaussian distribution, allowing the use of a Kalman filter for tractable inference. The result is the first algorithm for dynamic embedding of co-occurrence data which provides distributional information for its coordinate estimates. We demonstrate our model both on synthetic data and on author-word data from the NIPS corpus, showing that it produces intuitively reasonable embeddings. We also provide evidence for the usefulness of our model by its performance on an author-prediction task.
ER  -

APA

Sarkar, P., Siddiqi, S.M. & Gordon, G.J.. (2007). A Latent Space Approach to Dynamic Embedding of Co-occurrence Data. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 2:420-427 Available from https://proceedings.mlr.press/v2/sarkar07a.html.

Related Material

Download PDF