ZeroInflated Exponential Family Embeddings
[edit]
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:21402148, 2017.
Abstract
Word embeddings are a widelyused tool to analyze language, and exponential family embeddings (Rudolph et al., 2016) generalize the technique to other types of data. One challenge to fitting embedding methods is sparse data, such as a document/term matrix that contains many zeros. To address this issue, practitioners typically downweight or subsample the zeros, thus focusing learning on the nonzero entries. In this paper, we develop zeroinflated embeddings, a new embedding method that is designed to learn from sparse observations. In a zeroinflated embedding (ZIE), a zero in the data can come from an interaction to other data (i.e., an embedding) or from a separate process by which many observations are equal to zero (i.e. a probability mass at zero). Fitting a ZIE naturally downweights the zeros and dampens their influence on the model. Across many types of data—language, movie ratings, shopping histories, and bird watching logs—we found that zeroinflated embeddings provide improved predictive performance over standard approaches and find better vector representation of items.
Related Material


