Obtaining Adjustable Regularization for Free via Iterate Averaging

Jingfeng Wu, Vladimir Braverman, Lin Yang
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10344-10354, 2020.

Abstract

Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regression problems, then by averaging the SGD iterates properly, we obtain a regularized solution. It left open whether the same phenomenon can be achieved for other optimization problems and algorithms. In this paper, we establish an averaging scheme that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter. Our approaches can be used for accelerated and preconditioned optimization methods as well. We further show that the same methods work empirically on more general optimization objectives including neural networks. In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-wu20a, title = {Obtaining Adjustable Regularization for Free via Iterate Averaging}, author = {Wu, Jingfeng and Braverman, Vladimir and Yang, Lin}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {10344--10354}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/wu20a/wu20a.pdf}, url = {https://proceedings.mlr.press/v119/wu20a.html}, abstract = {Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regression problems, then by averaging the SGD iterates properly, we obtain a regularized solution. It left open whether the same phenomenon can be achieved for other optimization problems and algorithms. In this paper, we establish an averaging scheme that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter. Our approaches can be used for accelerated and preconditioned optimization methods as well. We further show that the same methods work empirically on more general optimization objectives including neural networks. In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.} }
Endnote
%0 Conference Paper %T Obtaining Adjustable Regularization for Free via Iterate Averaging %A Jingfeng Wu %A Vladimir Braverman %A Lin Yang %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-wu20a %I PMLR %P 10344--10354 %U https://proceedings.mlr.press/v119/wu20a.html %V 119 %X Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regression problems, then by averaging the SGD iterates properly, we obtain a regularized solution. It left open whether the same phenomenon can be achieved for other optimization problems and algorithms. In this paper, we establish an averaging scheme that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter. Our approaches can be used for accelerated and preconditioned optimization methods as well. We further show that the same methods work empirically on more general optimization objectives including neural networks. In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.
APA
Wu, J., Braverman, V. & Yang, L.. (2020). Obtaining Adjustable Regularization for Free via Iterate Averaging. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10344-10354 Available from https://proceedings.mlr.press/v119/wu20a.html.

Related Material