The Implicit Regularization of Stochastic Gradient Flow for Least Squares

Alnur Ali; Edgar Dobriban; Ryan Tibshirani

The Implicit Regularization of Stochastic Gradient Flow for Least Squares

Alnur Ali, Edgar Dobriban, Ryan Tibshirani

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:233-244, 2020.

Abstract

We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time

$t$ , over ridge regression with tuning parameter

$\lambda = 1/t$ . The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix

$X$ , and across the entire optimization path (not just at convergence).

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-ali20a,
  title = 	 {The Implicit Regularization of Stochastic Gradient Flow for Least Squares},
  author =       {Ali, Alnur and Dobriban, Edgar and Tibshirani, Ryan},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {233--244},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/ali20a/ali20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/ali20a.html},
  abstract = 	 {We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).}
}

Endnote

%0 Conference Paper
%T The Implicit Regularization of Stochastic Gradient Flow for Least Squares
%A Alnur Ali
%A Edgar Dobriban
%A Ryan Tibshirani
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-ali20a
%I PMLR
%P 233--244
%U https://proceedings.mlr.press/v119/ali20a.html
%V 119
%X We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).

APA


Ali, A., Dobriban, E. & Tibshirani, R.. (2020). The Implicit Regularization of Stochastic Gradient Flow for Least Squares. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:233-244 Available from https://proceedings.mlr.press/v119/ali20a.html.

The Implicit Regularization of Stochastic Gradient Flow for Least Squares

Abstract

Cite this Paper

Related Material