On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

Rotem Mulayoff, Sebastian U. Stich
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5210-5243, 2026.

Abstract

The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-mulayoff26a, title = {On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials}, author = {Mulayoff, Rotem and Stich, Sebastian U.}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {5210--5243}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/mulayoff26a/mulayoff26a.pdf}, url = {https://proceedings.mlr.press/v336/mulayoff26a.html}, abstract = {The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.} }
Endnote
%0 Conference Paper %T On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials %A Rotem Mulayoff %A Sebastian U. Stich %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-mulayoff26a %I PMLR %P 5210--5243 %U https://proceedings.mlr.press/v336/mulayoff26a.html %V 336 %X The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.
APA
Mulayoff, R. & Stich, S.U.. (2026). On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:5210-5243 Available from https://proceedings.mlr.press/v336/mulayoff26a.html.

Related Material