The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes: Sharp Estimates via Large Deviations

Waı̈ss Azizian, Franck Iutzeler, Jerome Malick, Panayotis Mertikopoulos
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:1982-2044, 2025.

Abstract

In this paper, we examine the time it takes for stochastic gradient descent (SGD) to reach the global minimum of a general, non-convex loss function. We approach this question through the lens of large deviations theory and randomly perturbed dynamical systems, and we provide a tight characterization of the associated hitting times of SGD with matching upper and lower bounds. Our analysis reveals that the global convergence time of SGD is dominated by the most "costly" set of obstacles that the algorithm may need to overcome in order to reach a global minimizer, coupling in this way the geometry of the underlying loss landscape with the statistics of the noise entering the process. Finally, motivated by applications to the training of deep neural networks, we provide a series of refinements and extensions of our analysis to, among others, loss functions with no spurious local minima or ones with bounded depths.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-azizian25a, title = {The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes: Sharp Estimates via Large Deviations}, author = {Azizian, Wa\"{\i}ss and Iutzeler, Franck and Malick, Jerome and Mertikopoulos, Panayotis}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {1982--2044}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/azizian25a/azizian25a.pdf}, url = {https://proceedings.mlr.press/v267/azizian25a.html}, abstract = {In this paper, we examine the time it takes for stochastic gradient descent (SGD) to reach the global minimum of a general, non-convex loss function. We approach this question through the lens of large deviations theory and randomly perturbed dynamical systems, and we provide a tight characterization of the associated hitting times of SGD with matching upper and lower bounds. Our analysis reveals that the global convergence time of SGD is dominated by the most "costly" set of obstacles that the algorithm may need to overcome in order to reach a global minimizer, coupling in this way the geometry of the underlying loss landscape with the statistics of the noise entering the process. Finally, motivated by applications to the training of deep neural networks, we provide a series of refinements and extensions of our analysis to, among others, loss functions with no spurious local minima or ones with bounded depths.} }
Endnote
%0 Conference Paper %T The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes: Sharp Estimates via Large Deviations %A Waı̈ss Azizian %A Franck Iutzeler %A Jerome Malick %A Panayotis Mertikopoulos %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-azizian25a %I PMLR %P 1982--2044 %U https://proceedings.mlr.press/v267/azizian25a.html %V 267 %X In this paper, we examine the time it takes for stochastic gradient descent (SGD) to reach the global minimum of a general, non-convex loss function. We approach this question through the lens of large deviations theory and randomly perturbed dynamical systems, and we provide a tight characterization of the associated hitting times of SGD with matching upper and lower bounds. Our analysis reveals that the global convergence time of SGD is dominated by the most "costly" set of obstacles that the algorithm may need to overcome in order to reach a global minimizer, coupling in this way the geometry of the underlying loss landscape with the statistics of the noise entering the process. Finally, motivated by applications to the training of deep neural networks, we provide a series of refinements and extensions of our analysis to, among others, loss functions with no spurious local minima or ones with bounded depths.
APA
Azizian, W., Iutzeler, F., Malick, J. & Mertikopoulos, P.. (2025). The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes: Sharp Estimates via Large Deviations. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:1982-2044 Available from https://proceedings.mlr.press/v267/azizian25a.html.

Related Material