Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances

Bugra Can; Mert Gurbuzbalaban; Lingjiong Zhu

Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances

Bugra Can, Mert Gurbuzbalaban, Lingjiong Zhu

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:891-901, 2019.

Abstract

Momentum methods such as Polyak’s heavy ball (HB) method, Nesterov’s accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to noise in the gradients. We study these methods under a first-order stochastic oracle model where noisy estimates of the gradients are available. For strongly convex problems, we show that the distribution of the iterates of AG converges with the accelerated $O(\sqrt{\kappa}\log(1/\varepsilon))$ linear rate to a ball of radius $\varepsilon$ centered at a unique invariant distribution in the 1-Wasserstein metric where $\kappa$ is the condition number as long as the noise variance is smaller than an explicit upper bound we can provide. Our analysis also certifies linear convergence rates as a function of the stepsize, momentum parameter and the noise variance; recovering the accelerated rates in the noiseless case and quantifying the level of noise that can be tolerated to achieve a given performance. To the best of our knowledge, these are the first linear convergence results for stochastic momentum methods under the stochastic oracle model. We also develop finer results for the special case of quadratic objectives, extend our results to the APG method and weakly convex functions showing accelerated rates when the noise magnitude is sufficiently small.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-can19a,
  title = 	 {Accelerated Linear Convergence of Stochastic Momentum Methods in {W}asserstein Distances},
  author =       {Can, Bugra and Gurbuzbalaban, Mert and Zhu, Lingjiong},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {891--901},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/can19a/can19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/can19a.html},
  abstract = 	 {Momentum methods such as Polyak’s heavy ball (HB) method, Nesterov’s accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to noise in the gradients. We study these methods under a first-order stochastic oracle model where noisy estimates of the gradients are available. For strongly convex problems, we show that the distribution of the iterates of AG converges with the accelerated $O(\sqrt{\kappa}\log(1/\varepsilon))$ linear rate to a ball of radius $\varepsilon$ centered at a unique invariant distribution in the 1-Wasserstein metric where $\kappa$ is the condition number as long as the noise variance is smaller than an explicit upper bound we can provide. Our analysis also certifies linear convergence rates as a function of the stepsize, momentum parameter and the noise variance; recovering the accelerated rates in the noiseless case and quantifying the level of noise that can be tolerated to achieve a given performance. To the best of our knowledge, these are the first linear convergence results for stochastic momentum methods under the stochastic oracle model. We also develop finer results for the special case of quadratic objectives, extend our results to the APG method and weakly convex functions showing accelerated rates when the noise magnitude is sufficiently small.}
}

Endnote

%0 Conference Paper
%T Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances
%A Bugra Can
%A Mert Gurbuzbalaban
%A Lingjiong Zhu
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-can19a
%I PMLR
%P 891--901
%U https://proceedings.mlr.press/v97/can19a.html
%V 97
%X Momentum methods such as Polyak’s heavy ball (HB) method, Nesterov’s accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to noise in the gradients. We study these methods under a first-order stochastic oracle model where noisy estimates of the gradients are available. For strongly convex problems, we show that the distribution of the iterates of AG converges with the accelerated $O(\sqrt{\kappa}\log(1/\varepsilon))$ linear rate to a ball of radius $\varepsilon$ centered at a unique invariant distribution in the 1-Wasserstein metric where $\kappa$ is the condition number as long as the noise variance is smaller than an explicit upper bound we can provide. Our analysis also certifies linear convergence rates as a function of the stepsize, momentum parameter and the noise variance; recovering the accelerated rates in the noiseless case and quantifying the level of noise that can be tolerated to achieve a given performance. To the best of our knowledge, these are the first linear convergence results for stochastic momentum methods under the stochastic oracle model. We also develop finer results for the special case of quadratic objectives, extend our results to the APG method and weakly convex functions showing accelerated rates when the noise magnitude is sufficiently small.

APA

Can, B., Gurbuzbalaban, M. & Zhu, L.. (2019). Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:891-901 Available from https://proceedings.mlr.press/v97/can19a.html.

Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances

Abstract

Cite this Paper

Related Material