Learning Sigmoid Belief Networks via Monte Carlo Expectation Maximization

Zhao Song, Ricardo Henao, David Carlson, Lawrence Carin
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1347-1355, 2016.

Abstract

Belief networks are commonly used generative models of data, but require expensive posterior estimation to train and test the model. Learning typically proceeds by posterior sampling, variational approximations, or recognition networks, combined with stochastic optimization. We propose using an online Monte Carlo expectation-maximization (MCEM) algorithm to learn the maximum a posteriori (MAP) estimator of the generative model or optimize the variational lower bound of a recognition network. The E-step in this algorithm requires posterior samples, which are already generated in current learning schema. For the M-step, we augment with Polya-Gamma (PG) random variables to give an analytic updating scheme. We show relationships to standard learning approaches by deriving stochastic gradient ascent in the MCEM framework. We apply the proposed methods to both binary and count data. Experimental results show that MCEM improves the convergence speed and often improves hold-out performance over existing learning methods. Our approach is readily generalized to other recognition networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-song16, title = {Learning Sigmoid Belief Networks via Monte Carlo Expectation Maximization}, author = {Song, Zhao and Henao, Ricardo and Carlson, David and Carin, Lawrence}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {1347--1355}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/song16.pdf}, url = {https://proceedings.mlr.press/v51/song16.html}, abstract = {Belief networks are commonly used generative models of data, but require expensive posterior estimation to train and test the model. Learning typically proceeds by posterior sampling, variational approximations, or recognition networks, combined with stochastic optimization. We propose using an online Monte Carlo expectation-maximization (MCEM) algorithm to learn the maximum a posteriori (MAP) estimator of the generative model or optimize the variational lower bound of a recognition network. The E-step in this algorithm requires posterior samples, which are already generated in current learning schema. For the M-step, we augment with Polya-Gamma (PG) random variables to give an analytic updating scheme. We show relationships to standard learning approaches by deriving stochastic gradient ascent in the MCEM framework. We apply the proposed methods to both binary and count data. Experimental results show that MCEM improves the convergence speed and often improves hold-out performance over existing learning methods. Our approach is readily generalized to other recognition networks.} }
Endnote
%0 Conference Paper %T Learning Sigmoid Belief Networks via Monte Carlo Expectation Maximization %A Zhao Song %A Ricardo Henao %A David Carlson %A Lawrence Carin %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-song16 %I PMLR %P 1347--1355 %U https://proceedings.mlr.press/v51/song16.html %V 51 %X Belief networks are commonly used generative models of data, but require expensive posterior estimation to train and test the model. Learning typically proceeds by posterior sampling, variational approximations, or recognition networks, combined with stochastic optimization. We propose using an online Monte Carlo expectation-maximization (MCEM) algorithm to learn the maximum a posteriori (MAP) estimator of the generative model or optimize the variational lower bound of a recognition network. The E-step in this algorithm requires posterior samples, which are already generated in current learning schema. For the M-step, we augment with Polya-Gamma (PG) random variables to give an analytic updating scheme. We show relationships to standard learning approaches by deriving stochastic gradient ascent in the MCEM framework. We apply the proposed methods to both binary and count data. Experimental results show that MCEM improves the convergence speed and often improves hold-out performance over existing learning methods. Our approach is readily generalized to other recognition networks.
RIS
TY - CPAPER TI - Learning Sigmoid Belief Networks via Monte Carlo Expectation Maximization AU - Zhao Song AU - Ricardo Henao AU - David Carlson AU - Lawrence Carin BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-song16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 1347 EP - 1355 L1 - http://proceedings.mlr.press/v51/song16.pdf UR - https://proceedings.mlr.press/v51/song16.html AB - Belief networks are commonly used generative models of data, but require expensive posterior estimation to train and test the model. Learning typically proceeds by posterior sampling, variational approximations, or recognition networks, combined with stochastic optimization. We propose using an online Monte Carlo expectation-maximization (MCEM) algorithm to learn the maximum a posteriori (MAP) estimator of the generative model or optimize the variational lower bound of a recognition network. The E-step in this algorithm requires posterior samples, which are already generated in current learning schema. For the M-step, we augment with Polya-Gamma (PG) random variables to give an analytic updating scheme. We show relationships to standard learning approaches by deriving stochastic gradient ascent in the MCEM framework. We apply the proposed methods to both binary and count data. Experimental results show that MCEM improves the convergence speed and often improves hold-out performance over existing learning methods. Our approach is readily generalized to other recognition networks. ER -
APA
Song, Z., Henao, R., Carlson, D. & Carin, L.. (2016). Learning Sigmoid Belief Networks via Monte Carlo Expectation Maximization. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1347-1355 Available from https://proceedings.mlr.press/v51/song16.html.

Related Material