Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size

Kento Imaizumi, Hideaki Iiduka
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:161-176, 2025.

Abstract

Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks — a representative case of stochastic nonconvex optimization — the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size — without decaying the learning rate — can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks. The code is available at https://github.com/iiduka-researches/qhm_acml25.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-imaizumi25a, title = {Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size}, author = {Imaizumi, Kento and Iiduka, Hideaki}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {161--176}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/imaizumi25a/imaizumi25a.pdf}, url = {https://proceedings.mlr.press/v304/imaizumi25a.html}, abstract = {Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks — a representative case of stochastic nonconvex optimization — the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size — without decaying the learning rate — can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks. The code is available at https://github.com/iiduka-researches/qhm_acml25.} }
Endnote
%0 Conference Paper %T Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size %A Kento Imaizumi %A Hideaki Iiduka %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-imaizumi25a %I PMLR %P 161--176 %U https://proceedings.mlr.press/v304/imaizumi25a.html %V 304 %X Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks — a representative case of stochastic nonconvex optimization — the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size — without decaying the learning rate — can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks. The code is available at https://github.com/iiduka-researches/qhm_acml25.
APA
Imaizumi, K. & Iiduka, H.. (2025). Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:161-176 Available from https://proceedings.mlr.press/v304/imaizumi25a.html.

Related Material