On the Convergence Properties of Contrastive Divergence

Ilya Sutskever; Tijmen Tieleman

On the Convergence Properties of Contrastive Divergence

Ilya Sutskever, Tijmen Tieleman

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:789-795, 2010.

Abstract

Contrastive Divergence (CD) is a popular method for estimating the parameters of Markov Random Fields (MRFs) by rapidly approximating an intractable term in the gradient of the log probability. Despite CD’s empirical success, little is known about its theoretical convergence properties. In this paper, we analyze the CD$_1$ update rule for Restricted Boltzmann Machines (RBMs) with binary variables. We show that this update is not the gradient of any function, and construct a counterintuitive “regularization function” that causes CD learning to cycle indefinitely. Nonetheless, we show that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v9-sutskever10a,
  title = 	 {On the Convergence Properties of Contrastive Divergence},
  author = 	 {Sutskever, Ilya and Tieleman, Tijmen},
  booktitle = 	 {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {789--795},
  year = 	 {2010},
  editor = 	 {Teh, Yee Whye and Titterington, Mike},
  volume = 	 {9},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Chia Laguna Resort, Sardinia, Italy},
  month = 	 {13--15 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v9/sutskever10a/sutskever10a.pdf},
  url = 	 {https://proceedings.mlr.press/v9/sutskever10a.html},
  abstract = 	 {Contrastive Divergence (CD) is a popular method for estimating the parameters of Markov Random Fields (MRFs) by rapidly approximating an intractable term in the gradient of the log probability. Despite CD’s empirical success, little is known about its theoretical convergence properties. In this paper, we analyze the CD$_1$ update rule for Restricted Boltzmann Machines (RBMs) with binary variables. We show that this update is not the gradient of any function, and construct a counterintuitive “regularization function” that causes CD learning to cycle indefinitely.  Nonetheless, we show that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.}
}

Endnote

%0 Conference Paper
%T On the Convergence Properties of Contrastive Divergence
%A Ilya Sutskever
%A Tijmen Tieleman
%B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2010
%E Yee Whye Teh
%E Mike Titterington	
%F pmlr-v9-sutskever10a
%I PMLR
%P 789--795
%U https://proceedings.mlr.press/v9/sutskever10a.html
%V 9
%X Contrastive Divergence (CD) is a popular method for estimating the parameters of Markov Random Fields (MRFs) by rapidly approximating an intractable term in the gradient of the log probability. Despite CD’s empirical success, little is known about its theoretical convergence properties. In this paper, we analyze the CD$_1$ update rule for Restricted Boltzmann Machines (RBMs) with binary variables. We show that this update is not the gradient of any function, and construct a counterintuitive “regularization function” that causes CD learning to cycle indefinitely.  Nonetheless, we show that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.

RIS


TY  - CPAPER
TI  - On the Convergence Properties of Contrastive Divergence
AU  - Ilya Sutskever
AU  - Tijmen Tieleman
BT  - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
DA  - 2010/03/31
ED  - Yee Whye Teh
ED  - Mike Titterington	
ID  - pmlr-v9-sutskever10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 9
SP  - 789
EP  - 795
L1  - http://proceedings.mlr.press/v9/sutskever10a/sutskever10a.pdf
UR  - https://proceedings.mlr.press/v9/sutskever10a.html
AB  - Contrastive Divergence (CD) is a popular method for estimating the parameters of Markov Random Fields (MRFs) by rapidly approximating an intractable term in the gradient of the log probability. Despite CD’s empirical success, little is known about its theoretical convergence properties. In this paper, we analyze the CD$_1$ update rule for Restricted Boltzmann Machines (RBMs) with binary variables. We show that this update is not the gradient of any function, and construct a counterintuitive “regularization function” that causes CD learning to cycle indefinitely.  Nonetheless, we show that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.
ER  -

APA


Sutskever, I. & Tieleman, T.. (2010). On the Convergence Properties of Contrastive Divergence. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:789-795 Available from https://proceedings.mlr.press/v9/sutskever10a.html.

Related Material

Download PDF