Stochastic Gradient and Langevin Processes

Xiang Cheng, Dong Yin, Peter Bartlett, Michael Jordan
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1810-1819, 2020.

Abstract

We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation. We study the setup where the additive noise can be non-Gaussian and state-dependent and the potential function can be non-convex. We show that the key properties of these processes depend on the potential function and the second moment of the additive noise. We apply our theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems and corroborate them with experiments using SGD to train deep neural networks on the CIFAR-10 dataset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-cheng20e, title = {Stochastic Gradient and {L}angevin Processes}, author = {Cheng, Xiang and Yin, Dong and Bartlett, Peter and Jordan, Michael}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {1810--1819}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/cheng20e/cheng20e.pdf}, url = {http://proceedings.mlr.press/v119/cheng20e.html}, abstract = {We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation. We study the setup where the additive noise can be non-Gaussian and state-dependent and the potential function can be non-convex. We show that the key properties of these processes depend on the potential function and the second moment of the additive noise. We apply our theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems and corroborate them with experiments using SGD to train deep neural networks on the CIFAR-10 dataset.} }
Endnote
%0 Conference Paper %T Stochastic Gradient and Langevin Processes %A Xiang Cheng %A Dong Yin %A Peter Bartlett %A Michael Jordan %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-cheng20e %I PMLR %P 1810--1819 %U http://proceedings.mlr.press/v119/cheng20e.html %V 119 %X We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation. We study the setup where the additive noise can be non-Gaussian and state-dependent and the potential function can be non-convex. We show that the key properties of these processes depend on the potential function and the second moment of the additive noise. We apply our theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems and corroborate them with experiments using SGD to train deep neural networks on the CIFAR-10 dataset.
APA
Cheng, X., Yin, D., Bartlett, P. & Jordan, M.. (2020). Stochastic Gradient and Langevin Processes. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:1810-1819 Available from http://proceedings.mlr.press/v119/cheng20e.html.

Related Material