SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality

Courtney Paquette; Kiwon Lee; Fabian Pedregosa; Elliot Paquette

SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality

Courtney Paquette, Kiwon Lee, Fabian Pedregosa, Elliot Paquette

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:3548-3626, 2021.

Abstract

We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large. This framework applies to any fixed stepsize and on the least squares problem with random data (finite-sum). Using this new framework, we show that the dynamics of SGD become deterministic in the large sample and dimensional limit. Furthermore, the limiting dynamics are governed by a Volterra integral equation. This model predicts that SGD undergoes a phase transition at an explicitly given critical stepsize that ultimately affects its convergence rate, which is also verified experimentally. Finally, when input data is isotropic, we provide explicit expressions for the dynamics and average-case convergence rates (i.e., the complexity of an algorithm averaged over all possible inputs). These rates show significant improvement over classical worst-case complexities.

Cite this Paper

BibTeX


@InProceedings{pmlr-v134-paquette21a,
  title = 	 {SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality},
  author =       {Paquette, Courtney and Lee, Kiwon and Pedregosa, Fabian and Paquette, Elliot},
  booktitle = 	 {Proceedings of Thirty Fourth Conference on Learning Theory},
  pages = 	 {3548--3626},
  year = 	 {2021},
  editor = 	 {Belkin, Mikhail and Kpotufe, Samory},
  volume = 	 {134},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v134/paquette21a/paquette21a.pdf},
  url = 	 {https://proceedings.mlr.press/v134/paquette21a.html},
  abstract = 	 {We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large. This framework applies to any fixed stepsize and on the least squares problem with random data (finite-sum). Using this new framework, we show that the dynamics of SGD become deterministic in the large sample and dimensional limit. Furthermore, the limiting dynamics are governed by a Volterra integral equation. This model predicts that SGD undergoes a phase transition at an explicitly given critical stepsize that ultimately affects its convergence rate, which is also verified experimentally. Finally, when input data is isotropic, we provide explicit expressions for the dynamics and average-case convergence rates (i.e., the complexity of an algorithm averaged over all possible inputs). These rates show significant improvement over classical worst-case complexities.}
}

Endnote

%0 Conference Paper
%T SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality
%A Courtney Paquette
%A Kiwon Lee
%A Fabian Pedregosa
%A Elliot Paquette
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe	
%F pmlr-v134-paquette21a
%I PMLR
%P 3548--3626
%U https://proceedings.mlr.press/v134/paquette21a.html
%V 134
%X We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large. This framework applies to any fixed stepsize and on the least squares problem with random data (finite-sum). Using this new framework, we show that the dynamics of SGD become deterministic in the large sample and dimensional limit. Furthermore, the limiting dynamics are governed by a Volterra integral equation. This model predicts that SGD undergoes a phase transition at an explicitly given critical stepsize that ultimately affects its convergence rate, which is also verified experimentally. Finally, when input data is isotropic, we provide explicit expressions for the dynamics and average-case convergence rates (i.e., the complexity of an algorithm averaged over all possible inputs). These rates show significant improvement over classical worst-case complexities.

APA


Paquette, C., Lee, K., Pedregosa, F. & Paquette, E.. (2021). SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:3548-3626 Available from https://proceedings.mlr.press/v134/paquette21a.html.

Related Material

Download PDF