Variational Autoencoders for Sparse and Overdispersed Discrete Data

He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung, Mingyuan Zhou
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1684-1694, 2020.

Abstract

Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-zhao20c, title = {Variational Autoencoders for Sparse and Overdispersed Discrete Data}, author = {Zhao, He and Rai, Piyush and Du, Lan and Buntine, Wray and Phung, Dinh and Zhou, Mingyuan}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {1684--1694}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/zhao20c/zhao20c.pdf}, url = { http://proceedings.mlr.press/v108/zhao20c.html }, abstract = {Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively.} }
Endnote
%0 Conference Paper %T Variational Autoencoders for Sparse and Overdispersed Discrete Data %A He Zhao %A Piyush Rai %A Lan Du %A Wray Buntine %A Dinh Phung %A Mingyuan Zhou %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-zhao20c %I PMLR %P 1684--1694 %U http://proceedings.mlr.press/v108/zhao20c.html %V 108 %X Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively.
APA
Zhao, H., Rai, P., Du, L., Buntine, W., Phung, D. & Zhou, M.. (2020). Variational Autoencoders for Sparse and Overdispersed Discrete Data. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1684-1694 Available from http://proceedings.mlr.press/v108/zhao20c.html .

Related Material