A new regret analysis for Adam-type algorithms

Ahmet Alacaoglu, Yura Malitsky, Panayotis Mertikopoulos, Volkan Cevher
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:202-210, 2020.

Abstract

In this paper, we focus on a theory-practice gap for Adam and its variants (AMSGrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $\beta_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $\beta_{1}\to0$ schedule. We show that this is an artifact of the standard analysis, and we propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $\beta_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-alacaoglu20b, title = {A new regret analysis for {A}dam-type algorithms}, author = {Alacaoglu, Ahmet and Malitsky, Yura and Mertikopoulos, Panayotis and Cevher, Volkan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {202--210}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/alacaoglu20b/alacaoglu20b.pdf}, url = {http://proceedings.mlr.press/v119/alacaoglu20b.html}, abstract = {In this paper, we focus on a theory-practice gap for Adam and its variants (AMSGrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $\beta_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $\beta_{1}\to0$ schedule. We show that this is an artifact of the standard analysis, and we propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $\beta_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.} }
Endnote
%0 Conference Paper %T A new regret analysis for Adam-type algorithms %A Ahmet Alacaoglu %A Yura Malitsky %A Panayotis Mertikopoulos %A Volkan Cevher %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-alacaoglu20b %I PMLR %P 202--210 %U http://proceedings.mlr.press/v119/alacaoglu20b.html %V 119 %X In this paper, we focus on a theory-practice gap for Adam and its variants (AMSGrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $\beta_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $\beta_{1}\to0$ schedule. We show that this is an artifact of the standard analysis, and we propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $\beta_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.
APA
Alacaoglu, A., Malitsky, Y., Mertikopoulos, P. & Cevher, V.. (2020). A new regret analysis for Adam-type algorithms. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:202-210 Available from http://proceedings.mlr.press/v119/alacaoglu20b.html.

Related Material