Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Changyou Chen; David Carlson; Zhe Gan; Chunyuan Li; Lawrence Carin

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1051-1060, 2016.

Abstract

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SG-MCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-chen16c,
  title = 	 {Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization},
  author = 	 {Chen, Changyou and Carlson, David and Gan, Zhe and Li, Chunyuan and Carin, Lawrence},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1051--1060},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/chen16c.pdf},
  url = 	 {https://proceedings.mlr.press/v51/chen16c.html},
  abstract = 	 {Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SG-MCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization algorithms.}
}

Endnote

%0 Conference Paper
%T Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization
%A Changyou Chen
%A David Carlson
%A Zhe Gan
%A Chunyuan Li
%A Lawrence Carin
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-chen16c
%I PMLR
%P 1051--1060
%U https://proceedings.mlr.press/v51/chen16c.html
%V 51
%X Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SG-MCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization algorithms.

RIS


TY  - CPAPER
TI  - Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization
AU  - Changyou Chen
AU  - David Carlson
AU  - Zhe Gan
AU  - Chunyuan Li
AU  - Lawrence Carin
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-chen16c
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 1051
EP  - 1060
L1  - http://proceedings.mlr.press/v51/chen16c.pdf
UR  - https://proceedings.mlr.press/v51/chen16c.html
AB  - Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SG-MCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization algorithms.
ER  -

APA


Chen, C., Carlson, D., Gan, Z., Li, C. & Carin, L.. (2016). Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1051-1060 Available from https://proceedings.mlr.press/v51/chen16c.html.

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Abstract

Cite this Paper

Related Material