Deep Generative Stochastic Networks Trainable by Backprop

Yoshua Bengio; Eric Laufer; Guillaume Alain; Jason Yosinski

Deep Generative Stochastic Networks Trainable by Backprop

Yoshua Bengio, Eric Laufer, Guillaume Alain, Jason Yosinski

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):226-234, 2014.

Abstract

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-bengio14,
  title = 	 {Deep Generative Stochastic Networks Trainable by Backprop},
  author = 	 {Bengio, Yoshua and Laufer, Eric and Alain, Guillaume and Yosinski, Jason},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {226--234},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/bengio14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/bengio14.html},
  abstract = 	 {We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution.  Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest.  Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining.}
}

Endnote

%0 Conference Paper
%T Deep Generative Stochastic Networks Trainable by Backprop
%A Yoshua Bengio
%A Eric Laufer
%A Guillaume Alain
%A Jason Yosinski
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-bengio14
%I PMLR
%P 226--234
%U https://proceedings.mlr.press/v32/bengio14.html
%V 32
%N 2
%X We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution.  Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest.  Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining.

RIS


TY  - CPAPER
TI  - Deep Generative Stochastic Networks Trainable by Backprop
AU  - Yoshua Bengio
AU  - Eric Laufer
AU  - Guillaume Alain
AU  - Jason Yosinski
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-bengio14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 226
EP  - 234
L1  - http://proceedings.mlr.press/v32/bengio14.pdf
UR  - https://proceedings.mlr.press/v32/bengio14.html
AB  - We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution.  Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest.  Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining.
ER  -

APA


Bengio, Y., Laufer, E., Alain, G. & Yosinski, J.. (2014). Deep Generative Stochastic Networks Trainable by Backprop. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):226-234 Available from https://proceedings.mlr.press/v32/bengio14.html.

Deep Generative Stochastic Networks Trainable by Backprop

Abstract

Cite this Paper

Related Material