Second-order regression models exhibit progressive sharpening to the edge of stability

Atish Agarwala, Fabian Pedregosa, Jeffrey Pennington
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:169-195, 2023.

Abstract

Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant Neural Tangent Kernel (NTK) regime, for which the predictive function is approximately linear in the parameters. As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute. In higher dimensions, the model generically shows similar behavior, even without the specific structure of a neural network, suggesting that progressive sharpening and edge-of-stability behavior aren’t unique features of neural networks, and could be a more general property of discrete learning algorithms in high-dimensional non-linear models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-agarwala23b, title = {Second-order regression models exhibit progressive sharpening to the edge of stability}, author = {Agarwala, Atish and Pedregosa, Fabian and Pennington, Jeffrey}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {169--195}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/agarwala23b/agarwala23b.pdf}, url = {https://proceedings.mlr.press/v202/agarwala23b.html}, abstract = {Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant Neural Tangent Kernel (NTK) regime, for which the predictive function is approximately linear in the parameters. As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute. In higher dimensions, the model generically shows similar behavior, even without the specific structure of a neural network, suggesting that progressive sharpening and edge-of-stability behavior aren’t unique features of neural networks, and could be a more general property of discrete learning algorithms in high-dimensional non-linear models.} }
Endnote
%0 Conference Paper %T Second-order regression models exhibit progressive sharpening to the edge of stability %A Atish Agarwala %A Fabian Pedregosa %A Jeffrey Pennington %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-agarwala23b %I PMLR %P 169--195 %U https://proceedings.mlr.press/v202/agarwala23b.html %V 202 %X Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant Neural Tangent Kernel (NTK) regime, for which the predictive function is approximately linear in the parameters. As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute. In higher dimensions, the model generically shows similar behavior, even without the specific structure of a neural network, suggesting that progressive sharpening and edge-of-stability behavior aren’t unique features of neural networks, and could be a more general property of discrete learning algorithms in high-dimensional non-linear models.
APA
Agarwala, A., Pedregosa, F. & Pennington, J.. (2023). Second-order regression models exhibit progressive sharpening to the edge of stability. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:169-195 Available from https://proceedings.mlr.press/v202/agarwala23b.html.

Related Material