How to Set $\beta_1, \beta_2$ in Adam: An Online Learning Perspective

Quan M. Nguyen

How to Set $\beta_1, \beta_2$ in Adam: An Online Learning Perspective

Quan M. Nguyen

Proceedings of The 37th International Conference on Algorithmic Learning Theory, PMLR 313:1-16, 2026.

Abstract

While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, $\beta_1$ and $\beta_2$, remains largely incomplete. Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL), one of the most important class of algorithms in online learning. The prior analyses in these works required setting $\beta_1 = \sqrt{\beta_2}$, which does not cover the more practical cases with $\beta_1 \neq \sqrt{\beta_2}$. We derive novel, more general analyses that hold for both $\beta_1 \geq \sqrt{\beta_2}$ and $\beta_1 \leq \sqrt{\beta_2}$. In both cases, our results strictly generalize the existing bounds. Furthermore, we show that our bounds are tight in the worst case. We also prove that setting $\beta_1 = \sqrt{\beta_2}$ is optimal for an oblivious adversary, but sub-optimal for an non-oblivious adversary.

Cite this Paper

BibTeX

@InProceedings{pmlr-v313-nguyen26b,
  title = 	 {How to Set $\beta_1, \beta_2$ in Adam: An Online Learning Perspective},
  author =       {Nguyen, Quan M.},
  booktitle = 	 {Proceedings of The 37th International Conference on Algorithmic Learning Theory},
  pages = 	 {1--16},
  year = 	 {2026},
  editor = 	 {Telgarsky, Matus and Ullman, Jonathan},
  volume = 	 {313},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--26 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v313/main/assets/nguyen26b/nguyen26b.pdf},
  url = 	 {https://proceedings.mlr.press/v313/nguyen26b.html},
  abstract = 	 {While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, $\beta_1$ and $\beta_2$, remains largely incomplete. Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL), one of the most important class of algorithms in online learning. The prior analyses in these works required setting $\beta_1 = \sqrt{\beta_2}$, which does not cover the more practical cases with $\beta_1 \neq \sqrt{\beta_2}$. We derive novel, more general analyses that hold for both $\beta_1 \geq \sqrt{\beta_2}$ and $\beta_1 \leq \sqrt{\beta_2}$. In both cases, our results strictly generalize the existing bounds. Furthermore, we show that our bounds are tight in the worst case. We also prove that setting $\beta_1 = \sqrt{\beta_2}$ is optimal for an oblivious adversary, but sub-optimal for an non-oblivious adversary.}
}

Endnote

%0 Conference Paper
%T How to Set $\beta_1, \beta_2$ in Adam: An Online Learning Perspective
%A Quan M. Nguyen
%B Proceedings of The 37th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Matus Telgarsky
%E Jonathan Ullman	
%F pmlr-v313-nguyen26b
%I PMLR
%P 1--16
%U https://proceedings.mlr.press/v313/nguyen26b.html
%V 313
%X While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, $\beta_1$ and $\beta_2$, remains largely incomplete. Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL), one of the most important class of algorithms in online learning. The prior analyses in these works required setting $\beta_1 = \sqrt{\beta_2}$, which does not cover the more practical cases with $\beta_1 \neq \sqrt{\beta_2}$. We derive novel, more general analyses that hold for both $\beta_1 \geq \sqrt{\beta_2}$ and $\beta_1 \leq \sqrt{\beta_2}$. In both cases, our results strictly generalize the existing bounds. Furthermore, we show that our bounds are tight in the worst case. We also prove that setting $\beta_1 = \sqrt{\beta_2}$ is optimal for an oblivious adversary, but sub-optimal for an non-oblivious adversary.

APA

Nguyen, Q.M.. (2026). How to Set $\beta_1, \beta_2$ in Adam: An Online Learning Perspective. Proceedings of The 37th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 313:1-16 Available from https://proceedings.mlr.press/v313/nguyen26b.html.

How to Set $\beta_1, \beta_2$ in Adam: An Online Learning Perspective

Abstract

Cite this Paper

Related Material