Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Wu Lin; Felix Dangel; Runa Eschenhagen; Kirill Neklyudov; Agustinus Kristiadi; Richard E. Turner; Alireza Makhzani

Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:29974-29991, 2024.

Abstract

Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-lin24f,
  title = 	 {Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable {KFAC}},
  author =       {Lin, Wu and Dangel, Felix and Eschenhagen, Runa and Neklyudov, Kirill and Kristiadi, Agustinus and Turner, Richard E. and Makhzani, Alireza},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {29974--29991},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/lin24f/lin24f.pdf},
  url = 	 {https://proceedings.mlr.press/v235/lin24f.html},
  abstract = 	 {Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.}
}

Endnote

%0 Conference Paper
%T Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC
%A Wu Lin
%A Felix Dangel
%A Runa Eschenhagen
%A Kirill Neklyudov
%A Agustinus Kristiadi
%A Richard E. Turner
%A Alireza Makhzani
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-lin24f
%I PMLR
%P 29974--29991
%U https://proceedings.mlr.press/v235/lin24f.html
%V 235
%X Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.

APA


Lin, W., Dangel, F., Eschenhagen, R., Neklyudov, K., Kristiadi, A., Turner, R.E. & Makhzani, A.. (2024). Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:29974-29991 Available from https://proceedings.mlr.press/v235/lin24f.html.

Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Abstract

Cite this Paper

Related Material