Gradient descent follows the regularization path for general losses

Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky
; Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:2109-2136, 2020.

Abstract

Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an \emph{implicit bias}. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the \emph{algorithm-independent} regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-ji20a, title = {Gradient descent follows the regularization path for general losses}, author = {Ji, Ziwei and Dud{\'i}k, Miroslav and Schapire, Robert E. and Telgarsky, Matus}, pages = {2109--2136}, year = {2020}, editor = {Jacob Abernethy and Shivani Agarwal}, volume = {125}, series = {Proceedings of Machine Learning Research}, address = {}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/ji20a/ji20a.pdf}, url = {http://proceedings.mlr.press/v125/ji20a.html}, abstract = { Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an \emph{implicit bias}. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the \emph{algorithm-independent} regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin. } }
Endnote
%0 Conference Paper %T Gradient descent follows the regularization path for general losses %A Ziwei Ji %A Miroslav Dudík %A Robert E. Schapire %A Matus Telgarsky %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-ji20a %I PMLR %J Proceedings of Machine Learning Research %P 2109--2136 %U http://proceedings.mlr.press %V 125 %W PMLR %X Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an \emph{implicit bias}. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the \emph{algorithm-independent} regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin.
APA
Ji, Z., Dudík, M., Schapire, R.E. & Telgarsky, M.. (2020). Gradient descent follows the regularization path for general losses. Proceedings of Thirty Third Conference on Learning Theory, in PMLR 125:2109-2136

Related Material