Explicit Regularization of Stochastic Gradient Methods through Duality

Anant Raj, Francis Bach
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1882-1890, 2021.

Abstract

We consider stochastic gradient methods under the interpolation regime where a perfect fit can be obtained (minimum loss at each observation). While previous work highlighted the implicit regularization of such algorithms, we consider an explicit regularization framework as a minimum Bregman divergence convex feasibility problem. Using convex duality, we propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent. For non-accelerated coordinate descent, we obtain an algorithm which bears strong similarities with (non-averaged) stochastic mirror descent on specific functions, as it is equivalent for quadratic objectives, and equivalent in the early iterations for more general objectives. It comes with the benefit of an explicit convergence theorem to a minimum norm solution. For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing stochastic gradient methods in the interpolating regime. This leads to accelerated versions of the perceptron for generic $\ell_p$-norm regularizers, which we illustrate in experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-raj21a, title = { Explicit Regularization of Stochastic Gradient Methods through Duality }, author = {Raj, Anant and Bach, Francis}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1882--1890}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/raj21a/raj21a.pdf}, url = {https://proceedings.mlr.press/v130/raj21a.html}, abstract = { We consider stochastic gradient methods under the interpolation regime where a perfect fit can be obtained (minimum loss at each observation). While previous work highlighted the implicit regularization of such algorithms, we consider an explicit regularization framework as a minimum Bregman divergence convex feasibility problem. Using convex duality, we propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent. For non-accelerated coordinate descent, we obtain an algorithm which bears strong similarities with (non-averaged) stochastic mirror descent on specific functions, as it is equivalent for quadratic objectives, and equivalent in the early iterations for more general objectives. It comes with the benefit of an explicit convergence theorem to a minimum norm solution. For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing stochastic gradient methods in the interpolating regime. This leads to accelerated versions of the perceptron for generic $\ell_p$-norm regularizers, which we illustrate in experiments. } }
Endnote
%0 Conference Paper %T Explicit Regularization of Stochastic Gradient Methods through Duality %A Anant Raj %A Francis Bach %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-raj21a %I PMLR %P 1882--1890 %U https://proceedings.mlr.press/v130/raj21a.html %V 130 %X We consider stochastic gradient methods under the interpolation regime where a perfect fit can be obtained (minimum loss at each observation). While previous work highlighted the implicit regularization of such algorithms, we consider an explicit regularization framework as a minimum Bregman divergence convex feasibility problem. Using convex duality, we propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent. For non-accelerated coordinate descent, we obtain an algorithm which bears strong similarities with (non-averaged) stochastic mirror descent on specific functions, as it is equivalent for quadratic objectives, and equivalent in the early iterations for more general objectives. It comes with the benefit of an explicit convergence theorem to a minimum norm solution. For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing stochastic gradient methods in the interpolating regime. This leads to accelerated versions of the perceptron for generic $\ell_p$-norm regularizers, which we illustrate in experiments.
APA
Raj, A. & Bach, F.. (2021). Explicit Regularization of Stochastic Gradient Methods through Duality . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1882-1890 Available from https://proceedings.mlr.press/v130/raj21a.html.

Related Material