Power-Law Escape Rate of SGD

Takashi Mori; Liu Ziyin; Kangqiao Liu; Masahito Ueda

Power-Law Escape Rate of SGD

Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:15959-15975, 2022.

Abstract

Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier

$\Delta\log L=\log[L(\theta^s)/L(\theta^*)]$ between a local minimum

$\theta^*$ and a saddle

$\theta^s$ determines the escape rate of SGD from the local minimum, contrary to the previous results borrowing from physics that the linear loss barrier

$\Delta L=L(\theta^s)-L(\theta^*)$ decides the escape rate. Our escape-rate formula strongly depends on the typical magnitude

$h^*$ and the number

$n$ of the outlier eigenvalues of the Hessian. This result explains an empirical fact that SGD prefers flat minima with low effective dimensions, giving an insight into implicit biases of SGD.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-mori22a,
  title = 	 {Power-Law Escape Rate of {SGD}},
  author =       {Mori, Takashi and Ziyin, Liu and Liu, Kangqiao and Ueda, Masahito},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {15959--15975},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/mori22a/mori22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/mori22a.html},
  abstract = 	 {Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier $\Delta\log L=\log[L(\theta^s)/L(\theta^*)]$ between a local minimum $\theta^*$ and a saddle $\theta^s$ determines the escape rate of SGD from the local minimum, contrary to the previous results borrowing from physics that the linear loss barrier $\Delta L=L(\theta^s)-L(\theta^*)$ decides the escape rate. Our escape-rate formula strongly depends on the typical magnitude $h^*$ and the number $n$ of the outlier eigenvalues of the Hessian. This result explains an empirical fact that SGD prefers flat minima with low effective dimensions, giving an insight into implicit biases of SGD.}
}

Endnote

%0 Conference Paper
%T Power-Law Escape Rate of SGD
%A Takashi Mori
%A Liu Ziyin
%A Kangqiao Liu
%A Masahito Ueda
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-mori22a
%I PMLR
%P 15959--15975
%U https://proceedings.mlr.press/v162/mori22a.html
%V 162
%X Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier $\Delta\log L=\log[L(\theta^s)/L(\theta^*)]$ between a local minimum $\theta^*$ and a saddle $\theta^s$ determines the escape rate of SGD from the local minimum, contrary to the previous results borrowing from physics that the linear loss barrier $\Delta L=L(\theta^s)-L(\theta^*)$ decides the escape rate. Our escape-rate formula strongly depends on the typical magnitude $h^*$ and the number $n$ of the outlier eigenvalues of the Hessian. This result explains an empirical fact that SGD prefers flat minima with low effective dimensions, giving an insight into implicit biases of SGD.

APA


Mori, T., Ziyin, L., Liu, K. & Ueda, M.. (2022). Power-Law Escape Rate of SGD. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:15959-15975 Available from https://proceedings.mlr.press/v162/mori22a.html.

Related Material

Download PDF