Deep Generative Stochastic Networks Trainable by Backprop

Yoshua Bengio, Eric Laufer, Guillaume Alain, Jason Yosinski
; Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):226-234, 2014.

Abstract

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-bengio14, title = {Deep Generative Stochastic Networks Trainable by Backprop}, author = {Yoshua Bengio and Eric Laufer and Guillaume Alain and Jason Yosinski}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {226--234}, year = {2014}, editor = {Eric P. Xing and Tony Jebara}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/bengio14.pdf}, url = {http://proceedings.mlr.press/v32/bengio14.html}, abstract = {We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining.} }
Endnote
%0 Conference Paper %T Deep Generative Stochastic Networks Trainable by Backprop %A Yoshua Bengio %A Eric Laufer %A Guillaume Alain %A Jason Yosinski %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-bengio14 %I PMLR %J Proceedings of Machine Learning Research %P 226--234 %U http://proceedings.mlr.press %V 32 %N 2 %W PMLR %X We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining.
RIS
TY - CPAPER TI - Deep Generative Stochastic Networks Trainable by Backprop AU - Yoshua Bengio AU - Eric Laufer AU - Guillaume Alain AU - Jason Yosinski BT - Proceedings of the 31st International Conference on Machine Learning PY - 2014/01/27 DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-bengio14 PB - PMLR SP - 226 DP - PMLR EP - 234 L1 - http://proceedings.mlr.press/v32/bengio14.pdf UR - http://proceedings.mlr.press/v32/bengio14.html AB - We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining. ER -
APA
Bengio, Y., Laufer, E., Alain, G. & Yosinski, J.. (2014). Deep Generative Stochastic Networks Trainable by Backprop. Proceedings of the 31st International Conference on Machine Learning, in PMLR 32(2):226-234

Related Material