The Implicit Regularization of Stochastic Gradient Flow for Least Squares

Alnur Ali, Edgar Dobriban, Ryan Tibshirani
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:233-244, 2020.

Abstract

We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time t, over ridge regression with tuning parameter λ=1/t. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix X, and across the entire optimization path (not just at convergence).

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-ali20a, title = {The Implicit Regularization of Stochastic Gradient Flow for Least Squares}, author = {Ali, Alnur and Dobriban, Edgar and Tibshirani, Ryan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {233--244}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/ali20a/ali20a.pdf}, url = {https://proceedings.mlr.press/v119/ali20a.html}, abstract = {We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).} }
Endnote
%0 Conference Paper %T The Implicit Regularization of Stochastic Gradient Flow for Least Squares %A Alnur Ali %A Edgar Dobriban %A Ryan Tibshirani %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-ali20a %I PMLR %P 233--244 %U https://proceedings.mlr.press/v119/ali20a.html %V 119 %X We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).
APA
Ali, A., Dobriban, E. & Tibshirani, R.. (2020). The Implicit Regularization of Stochastic Gradient Flow for Least Squares. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:233-244 Available from https://proceedings.mlr.press/v119/ali20a.html.

Related Material