On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

Niladri Chatterji, Nicolas Flammarion, Yian Ma, Peter Bartlett, Michael Jordan
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:764-773, 2018.

Abstract

We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion. We analyze these methods under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz. This is achieved by a new proof technique combining ideas from finite-sum optimization and the analysis of sampling methods. Our sharp theoretical bounds allow us to identify regimes of interest where each method performs better than the others. Our theory is verified with experiments on real-world and synthetic datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-chatterji18a, title = {On the Theory of Variance Reduction for Stochastic Gradient {M}onte {C}arlo}, author = {Chatterji, Niladri and Flammarion, Nicolas and Ma, Yian and Bartlett, Peter and Jordan, Michael}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {764--773}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/chatterji18a/chatterji18a.pdf}, url = {https://proceedings.mlr.press/v80/chatterji18a.html}, abstract = {We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion. We analyze these methods under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz. This is achieved by a new proof technique combining ideas from finite-sum optimization and the analysis of sampling methods. Our sharp theoretical bounds allow us to identify regimes of interest where each method performs better than the others. Our theory is verified with experiments on real-world and synthetic datasets.} }
Endnote
%0 Conference Paper %T On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo %A Niladri Chatterji %A Nicolas Flammarion %A Yian Ma %A Peter Bartlett %A Michael Jordan %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-chatterji18a %I PMLR %P 764--773 %U https://proceedings.mlr.press/v80/chatterji18a.html %V 80 %X We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion. We analyze these methods under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz. This is achieved by a new proof technique combining ideas from finite-sum optimization and the analysis of sampling methods. Our sharp theoretical bounds allow us to identify regimes of interest where each method performs better than the others. Our theory is verified with experiments on real-world and synthetic datasets.
APA
Chatterji, N., Flammarion, N., Ma, Y., Bartlett, P. & Jordan, M.. (2018). On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:764-773 Available from https://proceedings.mlr.press/v80/chatterji18a.html.

Related Material