Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:29974-29991, 2024.

Abstract

Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-lin24f, title = {Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable {KFAC}}, author = {Lin, Wu and Dangel, Felix and Eschenhagen, Runa and Neklyudov, Kirill and Kristiadi, Agustinus and Turner, Richard E. and Makhzani, Alireza}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {29974--29991}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/lin24f/lin24f.pdf}, url = {https://proceedings.mlr.press/v235/lin24f.html}, abstract = {Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.} }
Endnote
%0 Conference Paper %T Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC %A Wu Lin %A Felix Dangel %A Runa Eschenhagen %A Kirill Neklyudov %A Agustinus Kristiadi %A Richard E. Turner %A Alireza Makhzani %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-lin24f %I PMLR %P 29974--29991 %U https://proceedings.mlr.press/v235/lin24f.html %V 235 %X Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.
APA
Lin, W., Dangel, F., Eschenhagen, R., Neklyudov, K., Kristiadi, A., Turner, R.E. & Makhzani, A.. (2024). Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:29974-29991 Available from https://proceedings.mlr.press/v235/lin24f.html.

Related Material