Memory-Efficient Optimization with Factorized Hamiltonian Descent

Son Nguyen, Lizhang Chen, Bo Liu, Qiang Liu
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2863-2871, 2025.

Abstract

Modern deep learning heavily depends on adaptive optimizers such as Adam and its variants, which are renowned for their capacity to handle model scaling and streamline hyperparameter tuning. However, these algorithms typically experience high memory overhead caused by the accumulation of optimization states, leading to a critical challenge in training large-scale network models. In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a memory-efficient factorization approach to address this challenge. By employing a rank-1 parameterization for both momentum and scaling parameter estimators, H-Fac reduces memory costs to a sublinear level while maintaining competitive performance across a wide range of architectures. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings in optimization dynamics and convergence guarantees. These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-nguyen25e, title = {Memory-Efficient Optimization with Factorized Hamiltonian Descent}, author = {Nguyen, Son and Chen, Lizhang and Liu, Bo and Liu, Qiang}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2863--2871}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/nguyen25e/nguyen25e.pdf}, url = {https://proceedings.mlr.press/v258/nguyen25e.html}, abstract = {Modern deep learning heavily depends on adaptive optimizers such as Adam and its variants, which are renowned for their capacity to handle model scaling and streamline hyperparameter tuning. However, these algorithms typically experience high memory overhead caused by the accumulation of optimization states, leading to a critical challenge in training large-scale network models. In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a memory-efficient factorization approach to address this challenge. By employing a rank-1 parameterization for both momentum and scaling parameter estimators, H-Fac reduces memory costs to a sublinear level while maintaining competitive performance across a wide range of architectures. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings in optimization dynamics and convergence guarantees. These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.} }
Endnote
%0 Conference Paper %T Memory-Efficient Optimization with Factorized Hamiltonian Descent %A Son Nguyen %A Lizhang Chen %A Bo Liu %A Qiang Liu %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-nguyen25e %I PMLR %P 2863--2871 %U https://proceedings.mlr.press/v258/nguyen25e.html %V 258 %X Modern deep learning heavily depends on adaptive optimizers such as Adam and its variants, which are renowned for their capacity to handle model scaling and streamline hyperparameter tuning. However, these algorithms typically experience high memory overhead caused by the accumulation of optimization states, leading to a critical challenge in training large-scale network models. In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a memory-efficient factorization approach to address this challenge. By employing a rank-1 parameterization for both momentum and scaling parameter estimators, H-Fac reduces memory costs to a sublinear level while maintaining competitive performance across a wide range of architectures. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings in optimization dynamics and convergence guarantees. These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.
APA
Nguyen, S., Chen, L., Liu, B. & Liu, Q.. (2025). Memory-Efficient Optimization with Factorized Hamiltonian Descent. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2863-2871 Available from https://proceedings.mlr.press/v258/nguyen25e.html.

Related Material