Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

Kwangjun Ahn, Zhiyu Zhang, Yunbum Kook, Yan Dai
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:619-640, 2024.

Abstract

Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam’s algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ahn24b, title = {Understanding {A}dam Optimizer via Online Learning of Updates: {A}dam is {FTRL} in Disguise}, author = {Ahn, Kwangjun and Zhang, Zhiyu and Kook, Yunbum and Dai, Yan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {619--640}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ahn24b/ahn24b.pdf}, url = {https://proceedings.mlr.press/v235/ahn24b.html}, abstract = {Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam’s algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.} }
Endnote
%0 Conference Paper %T Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise %A Kwangjun Ahn %A Zhiyu Zhang %A Yunbum Kook %A Yan Dai %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ahn24b %I PMLR %P 619--640 %U https://proceedings.mlr.press/v235/ahn24b.html %V 235 %X Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam’s algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.
APA
Ahn, K., Zhang, Z., Kook, Y. & Dai, Y.. (2024). Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:619-640 Available from https://proceedings.mlr.press/v235/ahn24b.html.

Related Material