The Implicit Regularization of Stochastic Gradient Flow for Least Squares

Alnur Ali, Edgar Dobriban, Ryan Tibshirani
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:233-244, 2020.

Abstract

We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-ali20a, title = {The Implicit Regularization of Stochastic Gradient Flow for Least Squares}, author = {Ali, Alnur and Dobriban, Edgar and Tibshirani, Ryan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {233--244}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/ali20a/ali20a.pdf}, url = { http://proceedings.mlr.press/v119/ali20a.html }, abstract = {We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).} }
Endnote
%0 Conference Paper %T The Implicit Regularization of Stochastic Gradient Flow for Least Squares %A Alnur Ali %A Edgar Dobriban %A Ryan Tibshirani %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-ali20a %I PMLR %P 233--244 %U http://proceedings.mlr.press/v119/ali20a.html %V 119 %X We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).
APA
Ali, A., Dobriban, E. & Tibshirani, R.. (2020). The Implicit Regularization of Stochastic Gradient Flow for Least Squares. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:233-244 Available from http://proceedings.mlr.press/v119/ali20a.html .

Related Material