An Optimistic Acceleration of AMSGrad for Nonconvex Optimization

Jun-Kun Wang, Xiaoyun Li, Belhal Karimi, Ping Li
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:422-437, 2021.

Abstract

We propose a new variant of AMSGrad (Reddi et. al., 2018), a popular adaptive gradient based optimization algorithm widely used for training deep neural networks. Our algorithm adds prior knowledge about the sequence of consecutive mini-batch gradients and leverages its underlying structure making the gradients sequentially predictable. By exploiting the predictability process and ideas from optimistic online learning, the proposed algorithm can accelerate the convergence and increase its sample efficiency. After establishing a tighter upper bound under some convexity conditions on the regret, we offer a complimentary view of our algorithm which generalizes to the offline and stochastic nonconvex optimization settings. In the nonconvex case, we establish a non-asymptotic convergence bound independent of the initialization. We illustrate, via numerical experiments, the practical speedup on several deep learning models and benchmark datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-wang21c, title = {An Optimistic Acceleration of AMSGrad for Nonconvex Optimization}, author = {Wang, Jun-Kun and Li, Xiaoyun and Karimi, Belhal and Li, Ping}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {422--437}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/wang21c/wang21c.pdf}, url = {https://proceedings.mlr.press/v157/wang21c.html}, abstract = {We propose a new variant of AMSGrad (Reddi et. al., 2018), a popular adaptive gradient based optimization algorithm widely used for training deep neural networks. Our algorithm adds prior knowledge about the sequence of consecutive mini-batch gradients and leverages its underlying structure making the gradients sequentially predictable. By exploiting the predictability process and ideas from optimistic online learning, the proposed algorithm can accelerate the convergence and increase its sample efficiency. After establishing a tighter upper bound under some convexity conditions on the regret, we offer a complimentary view of our algorithm which generalizes to the offline and stochastic nonconvex optimization settings. In the nonconvex case, we establish a non-asymptotic convergence bound independent of the initialization. We illustrate, via numerical experiments, the practical speedup on several deep learning models and benchmark datasets.} }
Endnote
%0 Conference Paper %T An Optimistic Acceleration of AMSGrad for Nonconvex Optimization %A Jun-Kun Wang %A Xiaoyun Li %A Belhal Karimi %A Ping Li %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-wang21c %I PMLR %P 422--437 %U https://proceedings.mlr.press/v157/wang21c.html %V 157 %X We propose a new variant of AMSGrad (Reddi et. al., 2018), a popular adaptive gradient based optimization algorithm widely used for training deep neural networks. Our algorithm adds prior knowledge about the sequence of consecutive mini-batch gradients and leverages its underlying structure making the gradients sequentially predictable. By exploiting the predictability process and ideas from optimistic online learning, the proposed algorithm can accelerate the convergence and increase its sample efficiency. After establishing a tighter upper bound under some convexity conditions on the regret, we offer a complimentary view of our algorithm which generalizes to the offline and stochastic nonconvex optimization settings. In the nonconvex case, we establish a non-asymptotic convergence bound independent of the initialization. We illustrate, via numerical experiments, the practical speedup on several deep learning models and benchmark datasets.
APA
Wang, J., Li, X., Karimi, B. & Li, P.. (2021). An Optimistic Acceleration of AMSGrad for Nonconvex Optimization. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:422-437 Available from https://proceedings.mlr.press/v157/wang21c.html.

Related Material