LDA Topic Model with Soft Assignment of Descriptors to Words

Daphna Weinshall; Gal Levi; Dmitri Hanukaev

LDA Topic Model with Soft Assignment of Descriptors to Words

Daphna Weinshall, Gal Levi, Dmitri Hanukaev

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):711-719, 2013.

Abstract

The LDA topic model is being used to model corpora of documents that can be represented by bags of words. Here we extend the LDA model to deal with documents that are represented more naturally by bags of continuous descriptors. Given a finite dictionary of words which are generative models of descriptors, our extended LDA model allows for the soft assignment of descriptors to (many) dictionary words. We derive variational inference and parameter estimation procedures for the extended model, which closely resemble those obtained for the original model, with two important differences: First, the histogram of word counts is replaced by a histogram of pseudo word counts, or sums of responsibilities over all descriptors. Second, parameter estimation now depends on the average covariance matrix between these pseudo-counts, reflecting the fact that with soft assignment words are not independent. We use this approach to address novelty detection, where we seek to identify video events with low posterior probability. Video events are described by a generative dynamic texture model, from which we naturally derive a dictionary of generative words. Using a benchmark dataset for novelty detection, we show a very significant improvement in the detection of novel events when using our extended LDA model with soft assignment to words as against hard assignment (the original model), achieving state of the art novelty detection results.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-weinshall13,
  title = 	 {LDA Topic Model with Soft Assignment of Descriptors to Words},
  author = 	 {Weinshall, Daphna and Levi, Gal and Hanukaev, Dmitri},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {711--719},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/weinshall13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/weinshall13.html},
  abstract = 	 {The LDA topic model is being used to model corpora of documents that can be represented by bags of words. Here we extend the LDA model to deal with documents that are represented more naturally by bags of continuous descriptors. Given a finite dictionary of words which are generative models of descriptors, our extended LDA model allows for the soft assignment of descriptors to (many) dictionary words. We derive variational inference and parameter estimation procedures for the extended model, which closely resemble those obtained for the original model, with two important differences: First, the histogram of word counts is replaced by a histogram of pseudo word counts, or sums of responsibilities over all descriptors. Second, parameter estimation now depends on the average covariance matrix between these pseudo-counts, reflecting the fact that with soft assignment words are not independent.    We use this approach to address novelty detection, where we seek to identify video events with low posterior probability. Video events are described by a generative dynamic texture model, from which we naturally derive a dictionary of generative words.  Using a benchmark dataset for novelty detection, we show a very significant improvement in the detection of novel events when using our extended LDA model with soft assignment to words as against hard assignment (the original model), achieving state of the art novelty detection results.}
}

Endnote

%0 Conference Paper
%T LDA Topic Model with Soft Assignment of Descriptors to Words
%A Daphna Weinshall
%A Gal Levi
%A Dmitri Hanukaev
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-weinshall13
%I PMLR
%P 711--719
%U https://proceedings.mlr.press/v28/weinshall13.html
%V 28
%N 3
%X The LDA topic model is being used to model corpora of documents that can be represented by bags of words. Here we extend the LDA model to deal with documents that are represented more naturally by bags of continuous descriptors. Given a finite dictionary of words which are generative models of descriptors, our extended LDA model allows for the soft assignment of descriptors to (many) dictionary words. We derive variational inference and parameter estimation procedures for the extended model, which closely resemble those obtained for the original model, with two important differences: First, the histogram of word counts is replaced by a histogram of pseudo word counts, or sums of responsibilities over all descriptors. Second, parameter estimation now depends on the average covariance matrix between these pseudo-counts, reflecting the fact that with soft assignment words are not independent.    We use this approach to address novelty detection, where we seek to identify video events with low posterior probability. Video events are described by a generative dynamic texture model, from which we naturally derive a dictionary of generative words.  Using a benchmark dataset for novelty detection, we show a very significant improvement in the detection of novel events when using our extended LDA model with soft assignment to words as against hard assignment (the original model), achieving state of the art novelty detection results.

RIS


TY  - CPAPER
TI  - LDA Topic Model with Soft Assignment of Descriptors to Words
AU  - Daphna Weinshall
AU  - Gal Levi
AU  - Dmitri Hanukaev
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-weinshall13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 711
EP  - 719
L1  - http://proceedings.mlr.press/v28/weinshall13.pdf
UR  - https://proceedings.mlr.press/v28/weinshall13.html
AB  - The LDA topic model is being used to model corpora of documents that can be represented by bags of words. Here we extend the LDA model to deal with documents that are represented more naturally by bags of continuous descriptors. Given a finite dictionary of words which are generative models of descriptors, our extended LDA model allows for the soft assignment of descriptors to (many) dictionary words. We derive variational inference and parameter estimation procedures for the extended model, which closely resemble those obtained for the original model, with two important differences: First, the histogram of word counts is replaced by a histogram of pseudo word counts, or sums of responsibilities over all descriptors. Second, parameter estimation now depends on the average covariance matrix between these pseudo-counts, reflecting the fact that with soft assignment words are not independent.    We use this approach to address novelty detection, where we seek to identify video events with low posterior probability. Video events are described by a generative dynamic texture model, from which we naturally derive a dictionary of generative words.  Using a benchmark dataset for novelty detection, we show a very significant improvement in the detection of novel events when using our extended LDA model with soft assignment to words as against hard assignment (the original model), achieving state of the art novelty detection results.
ER  -

APA


Weinshall, D., Levi, G. & Hanukaev, D.. (2013). LDA Topic Model with Soft Assignment of Descriptors to Words. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):711-719 Available from https://proceedings.mlr.press/v28/weinshall13.html.

Related Material

Download PDF