On the Convergence of Nesterov’s Accelerated Gradient Method in Stochastic Settings

Mahmoud Assran, Mike Rabbat
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:410-420, 2020.

Abstract

We study Nesterov’s accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches). To build better insight into the behavior of Nesterov’s method in stochastic settings, we focus throughout on objectives that are smooth, strongly-convex, and twice continuously differentiable. In the stochastic approximation setting, Nesterov’s method converges to a neighborhood of the optimal point at the same accelerated rate as in the deterministic setting. Perhaps surprisingly, in the finite-sum setting, we prove that Nesterov’s method may diverge with the usual choice of step-size and momentum, unless additional conditions on the problem related to conditioning and data coherence are satisfied. Our results shed light as to why Nesterov’s method may fail to converge or achieve acceleration in the finite-sum setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-assran20a, title = {On the Convergence of {N}esterov’s Accelerated Gradient Method in Stochastic Settings}, author = {Assran, Mahmoud and Rabbat, Mike}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {410--420}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/assran20a/assran20a.pdf}, url = {https://proceedings.mlr.press/v119/assran20a.html}, abstract = {We study Nesterov’s accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches). To build better insight into the behavior of Nesterov’s method in stochastic settings, we focus throughout on objectives that are smooth, strongly-convex, and twice continuously differentiable. In the stochastic approximation setting, Nesterov’s method converges to a neighborhood of the optimal point at the same accelerated rate as in the deterministic setting. Perhaps surprisingly, in the finite-sum setting, we prove that Nesterov’s method may diverge with the usual choice of step-size and momentum, unless additional conditions on the problem related to conditioning and data coherence are satisfied. Our results shed light as to why Nesterov’s method may fail to converge or achieve acceleration in the finite-sum setting.} }
Endnote
%0 Conference Paper %T On the Convergence of Nesterov’s Accelerated Gradient Method in Stochastic Settings %A Mahmoud Assran %A Mike Rabbat %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-assran20a %I PMLR %P 410--420 %U https://proceedings.mlr.press/v119/assran20a.html %V 119 %X We study Nesterov’s accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches). To build better insight into the behavior of Nesterov’s method in stochastic settings, we focus throughout on objectives that are smooth, strongly-convex, and twice continuously differentiable. In the stochastic approximation setting, Nesterov’s method converges to a neighborhood of the optimal point at the same accelerated rate as in the deterministic setting. Perhaps surprisingly, in the finite-sum setting, we prove that Nesterov’s method may diverge with the usual choice of step-size and momentum, unless additional conditions on the problem related to conditioning and data coherence are satisfied. Our results shed light as to why Nesterov’s method may fail to converge or achieve acceleration in the finite-sum setting.
APA
Assran, M. & Rabbat, M.. (2020). On the Convergence of Nesterov’s Accelerated Gradient Method in Stochastic Settings. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:410-420 Available from https://proceedings.mlr.press/v119/assran20a.html.

Related Material