AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC

Ruqi Zhang, A. Feder Cooper, Christopher De Sa
; Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2142-2152, 2020.

Abstract

Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It is a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only a subsample. This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution. One can prevent this using a step size that decays to zero, but such a step size schedule can drastically slow down convergence. To address this tension, we propose a novel second-order SG-MCMC algorithm—AMAGOLD—that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias. The infrequency of corrections amortizes their cost. We prove AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline. We empirically demonstrate AMAGOLD’s effectiveness on synthetic distributions, Bayesian logistic regression, and Bayesian neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-zhang20e, title = {AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC}, author = {Zhang, Ruqi and Cooper, A. Feder and Sa, Christopher De}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2142--2152}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, address = {Online}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/zhang20e/zhang20e.pdf}, url = {http://proceedings.mlr.press/v108/zhang20e.html}, abstract = {Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It is a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only a subsample. This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution. One can prevent this using a step size that decays to zero, but such a step size schedule can drastically slow down convergence. To address this tension, we propose a novel second-order SG-MCMC algorithm—AMAGOLD—that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias. The infrequency of corrections amortizes their cost. We prove AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline. We empirically demonstrate AMAGOLD’s effectiveness on synthetic distributions, Bayesian logistic regression, and Bayesian neural networks.} }
Endnote
%0 Conference Paper %T AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC %A Ruqi Zhang %A A. Feder Cooper %A Christopher De Sa %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-zhang20e %I PMLR %J Proceedings of Machine Learning Research %P 2142--2152 %U http://proceedings.mlr.press %V 108 %W PMLR %X Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It is a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only a subsample. This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution. One can prevent this using a step size that decays to zero, but such a step size schedule can drastically slow down convergence. To address this tension, we propose a novel second-order SG-MCMC algorithm—AMAGOLD—that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias. The infrequency of corrections amortizes their cost. We prove AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline. We empirically demonstrate AMAGOLD’s effectiveness on synthetic distributions, Bayesian logistic regression, and Bayesian neural networks.
APA
Zhang, R., Cooper, A.F. & Sa, C.D.. (2020). AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in PMLR 108:2142-2152

Related Material