[edit]
Gradient Descent Converges Arbitrarily Fast for Logistic Regression via Large and Adaptive Stepsizes
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:76361-76384, 2025.
Abstract
We analyze the convergence of gradient descent (GD) with large, adaptive stepsizes for logistic regression on linearly separable data. The stepsize adapts to the current risk, scaled by a fixed base stepsize $\eta$. We prove that once the number of iterates t surpasses a margin-dependent threshold, the averaged GD iterate achieves a risk upper bound of exp(-$\Theta$($\eta$t)), where $\eta$can be chosen arbitrarily large. This implies that GD attains arbitrarily fast convergence rates via large stepsizes, although the risk evolution might not be monotonic. In contrast, prior adaptive stepsize GD analyses require a monotonic risk decrease, limiting their rates to exp(-$\Theta$(t)). We further establish a margin-dependent lower bound on the iteration complexity for any first-order method to attain a small risk, justifying the necessity of the burn-in phase in our analysis. Our results generalize to a broad class of loss functions and two-layer networks under additional assumptions.