Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size

Kento Imaizumi; Hideaki Iiduka

Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size

Kento Imaizumi, Hideaki Iiduka

Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:161-176, 2025.

Abstract

Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks — a representative case of stochastic nonconvex optimization — the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size — without decaying the learning rate — can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks. The code is available at https://github.com/iiduka-researches/qhm_acml25.

Cite this Paper

BibTeX

@InProceedings{pmlr-v304-imaizumi25a,
  title = 	 {Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size},
  author =       {Imaizumi, Kento and Iiduka, Hideaki},
  booktitle = 	 {Proceedings of the 17th Asian Conference on Machine Learning},
  pages = 	 {161--176},
  year = 	 {2025},
  editor = 	 {Lee, Hung-yi and Liu, Tongliang},
  volume = 	 {304},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v304/main/assets/imaizumi25a/imaizumi25a.pdf},
  url = 	 {https://proceedings.mlr.press/v304/imaizumi25a.html},
  abstract = 	 {Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks — a representative case of stochastic nonconvex optimization — the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size — without decaying the learning rate — can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks. The code is available at https://github.com/iiduka-researches/qhm_acml25.}
}

Endnote

%0 Conference Paper
%T Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size
%A Kento Imaizumi
%A Hideaki Iiduka
%B Proceedings of the 17th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Hung-yi Lee
%E Tongliang Liu	
%F pmlr-v304-imaizumi25a
%I PMLR
%P 161--176
%U https://proceedings.mlr.press/v304/imaizumi25a.html
%V 304
%X Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks — a representative case of stochastic nonconvex optimization — the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size — without decaying the learning rate — can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks. The code is available at https://github.com/iiduka-researches/qhm_acml25.

APA

Imaizumi, K. & Iiduka, H.. (2025). Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:161-176 Available from https://proceedings.mlr.press/v304/imaizumi25a.html.

Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size

Abstract

Cite this Paper

Related Material