Global curvature for second-order optimization of neural networks

Alberto Bernacchia
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:3843-3876, 2025.

Abstract

Second-order optimization methods, which leverage the local curvature of the loss function, have the potential to dramatically accelerate the training of machine learning models. However, these methods are often hindered by the computational burden of constructing and inverting large curvature matrices with $\mathcal{O}(p^2)$ elements, where $p$ is the number of parameters. In this work, we present a theory that predicts the exact structure of the global curvature by leveraging the intrinsic symmetries of neural networks, such as invariance under parameter permutations. For Multi-Layer Perceptrons (MLPs), our approach reveals that the global curvature can be expressed in terms of $\mathcal{O}(d^2 + L^2)$ independent factors, where $d$ is the number of input/output dimensions and $L$ is the number of layers, significantly reducing the computational burden compared to the $\mathcal{O}(p^2)$ elements of the full matrix. These factors can be estimated efficiently, enabling precise curvature computations. To evaluate the practical implications of our framework, we apply second-order optimization to synthetic data, achieving markedly faster convergence compared to traditional optimization methods. Our findings pave the way for a better understanding of the loss landscape of neural networks, and for designing more efficient training methodologies in deep learning. Code: https://github.com/mtkresearch/symo_notebooks

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-bernacchia25a, title = {Global curvature for second-order optimization of neural networks}, author = {Bernacchia, Alberto}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {3843--3876}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/bernacchia25a/bernacchia25a.pdf}, url = {https://proceedings.mlr.press/v267/bernacchia25a.html}, abstract = {Second-order optimization methods, which leverage the local curvature of the loss function, have the potential to dramatically accelerate the training of machine learning models. However, these methods are often hindered by the computational burden of constructing and inverting large curvature matrices with $\mathcal{O}(p^2)$ elements, where $p$ is the number of parameters. In this work, we present a theory that predicts the exact structure of the global curvature by leveraging the intrinsic symmetries of neural networks, such as invariance under parameter permutations. For Multi-Layer Perceptrons (MLPs), our approach reveals that the global curvature can be expressed in terms of $\mathcal{O}(d^2 + L^2)$ independent factors, where $d$ is the number of input/output dimensions and $L$ is the number of layers, significantly reducing the computational burden compared to the $\mathcal{O}(p^2)$ elements of the full matrix. These factors can be estimated efficiently, enabling precise curvature computations. To evaluate the practical implications of our framework, we apply second-order optimization to synthetic data, achieving markedly faster convergence compared to traditional optimization methods. Our findings pave the way for a better understanding of the loss landscape of neural networks, and for designing more efficient training methodologies in deep learning. Code: https://github.com/mtkresearch/symo_notebooks} }
Endnote
%0 Conference Paper %T Global curvature for second-order optimization of neural networks %A Alberto Bernacchia %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-bernacchia25a %I PMLR %P 3843--3876 %U https://proceedings.mlr.press/v267/bernacchia25a.html %V 267 %X Second-order optimization methods, which leverage the local curvature of the loss function, have the potential to dramatically accelerate the training of machine learning models. However, these methods are often hindered by the computational burden of constructing and inverting large curvature matrices with $\mathcal{O}(p^2)$ elements, where $p$ is the number of parameters. In this work, we present a theory that predicts the exact structure of the global curvature by leveraging the intrinsic symmetries of neural networks, such as invariance under parameter permutations. For Multi-Layer Perceptrons (MLPs), our approach reveals that the global curvature can be expressed in terms of $\mathcal{O}(d^2 + L^2)$ independent factors, where $d$ is the number of input/output dimensions and $L$ is the number of layers, significantly reducing the computational burden compared to the $\mathcal{O}(p^2)$ elements of the full matrix. These factors can be estimated efficiently, enabling precise curvature computations. To evaluate the practical implications of our framework, we apply second-order optimization to synthetic data, achieving markedly faster convergence compared to traditional optimization methods. Our findings pave the way for a better understanding of the loss landscape of neural networks, and for designing more efficient training methodologies in deep learning. Code: https://github.com/mtkresearch/symo_notebooks
APA
Bernacchia, A.. (2025). Global curvature for second-order optimization of neural networks. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:3843-3876 Available from https://proceedings.mlr.press/v267/bernacchia25a.html.

Related Material