A new regret analysis for Adam-type algorithms

Ahmet Alacaoglu; Yura Malitsky; Panayotis Mertikopoulos; Volkan Cevher

A new regret analysis for Adam-type algorithms

Ahmet Alacaoglu, Yura Malitsky, Panayotis Mertikopoulos, Volkan Cevher

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:202-210, 2020.

Abstract

In this paper, we focus on a theory-practice gap for Adam and its variants (AMSGrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter

$\beta_{1}$ (typically between

$0.9$ and

$0.99$ ). In theory, regret guarantees for online convex optimization require a rapidly decaying

$\beta_{1}\to0$ schedule. We show that this is an artifact of the standard analysis, and we propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant

$\beta_{1}$ , without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-alacaoglu20b,
  title = 	 {A new regret analysis for {A}dam-type algorithms},
  author =       {Alacaoglu, Ahmet and Malitsky, Yura and Mertikopoulos, Panayotis and Cevher, Volkan},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {202--210},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/alacaoglu20b/alacaoglu20b.pdf},
  url = 	 {https://proceedings.mlr.press/v119/alacaoglu20b.html},
  abstract = 	 {In this paper, we focus on a theory-practice gap for Adam and its variants (AMSGrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $\beta_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $\beta_{1}\to0$ schedule. We show that this is an artifact of the standard analysis, and we propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $\beta_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.}
}

Endnote

%0 Conference Paper
%T A new regret analysis for Adam-type algorithms
%A Ahmet Alacaoglu
%A Yura Malitsky
%A Panayotis Mertikopoulos
%A Volkan Cevher
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-alacaoglu20b
%I PMLR
%P 202--210
%U https://proceedings.mlr.press/v119/alacaoglu20b.html
%V 119
%X In this paper, we focus on a theory-practice gap for Adam and its variants (AMSGrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $\beta_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $\beta_{1}\to0$ schedule. We show that this is an artifact of the standard analysis, and we propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $\beta_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.

APA


Alacaoglu, A., Malitsky, Y., Mertikopoulos, P. & Cevher, V.. (2020). A new regret analysis for Adam-type algorithms. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:202-210 Available from https://proceedings.mlr.press/v119/alacaoglu20b.html.

A new regret analysis for Adam-type algorithms

Abstract

Cite this Paper

Related Material