[edit]
Gradient descent follows the regularization path for general losses
[edit]
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:21092136, 2020.
Abstract
Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an \emph{implicit bias}. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the crossentropy loss. In this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradientdescent path and the \emph{algorithmindependent} regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widelyused exponentiallytailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentiallytailed losses is necessarily to the maximummargin direction, other losses such as polynomiallytailed losses may induce convergence to a direction with a poor margin.
Related Material


