Amortized Nesterov’s Momentum: A Robust Momentum and Its Application to Deep Learning

Kaiwen Zhou, Yanghua Jin, Qinghua Ding, James Cheng
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:211-220, 2020.

Abstract

This work proposes a novel momentum technique, the Amortized Nesterov’s Momentum, for stochastic convex optimization. The proposed method can be regarded as a smooth transition between Nesterov’s method and mirror descent. By tuning only a single parameter, users can trade Nesterov’s acceleration for robustness, that is, the variance control of the stochastic noise. Motivated by the recent success of using momentum in deep learning, we conducted extensive experiments to evaluate this new momentum in deep learning tasks. The results suggest that it can serve as a favorable alternative for Nesterov’s momentum.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-zhou20a, title = {Amortized Nesterov’s Momentum: A Robust Momentum and Its Application to Deep Learning}, author = {Zhou, Kaiwen and Jin, Yanghua and Ding, Qinghua and Cheng, James}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {211--220}, year = {2020}, editor = {Jonas Peters and David Sontag}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/zhou20a/zhou20a.pdf}, url = { http://proceedings.mlr.press/v124/zhou20a.html }, abstract = {This work proposes a novel momentum technique, the Amortized Nesterov’s Momentum, for stochastic convex optimization. The proposed method can be regarded as a smooth transition between Nesterov’s method and mirror descent. By tuning only a single parameter, users can trade Nesterov’s acceleration for robustness, that is, the variance control of the stochastic noise. Motivated by the recent success of using momentum in deep learning, we conducted extensive experiments to evaluate this new momentum in deep learning tasks. The results suggest that it can serve as a favorable alternative for Nesterov’s momentum.} }
Endnote
%0 Conference Paper %T Amortized Nesterov’s Momentum: A Robust Momentum and Its Application to Deep Learning %A Kaiwen Zhou %A Yanghua Jin %A Qinghua Ding %A James Cheng %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-zhou20a %I PMLR %P 211--220 %U http://proceedings.mlr.press/v124/zhou20a.html %V 124 %X This work proposes a novel momentum technique, the Amortized Nesterov’s Momentum, for stochastic convex optimization. The proposed method can be regarded as a smooth transition between Nesterov’s method and mirror descent. By tuning only a single parameter, users can trade Nesterov’s acceleration for robustness, that is, the variance control of the stochastic noise. Motivated by the recent success of using momentum in deep learning, we conducted extensive experiments to evaluate this new momentum in deep learning tasks. The results suggest that it can serve as a favorable alternative for Nesterov’s momentum.
APA
Zhou, K., Jin, Y., Ding, Q. & Cheng, J.. (2020). Amortized Nesterov’s Momentum: A Robust Momentum and Its Application to Deep Learning. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:211-220 Available from http://proceedings.mlr.press/v124/zhou20a.html .

Related Material