Gradient Descent Converges Arbitrarily Fast for Logistic Regression via Large and Adaptive Stepsizes

Ruiqi Zhang, Jingfeng Wu, Peter Bartlett
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:76361-76384, 2025.

Abstract

We analyze the convergence of gradient descent (GD) with large, adaptive stepsizes for logistic regression on linearly separable data. The stepsize adapts to the current risk, scaled by a fixed base stepsize $\eta$. We prove that once the number of iterates t surpasses a margin-dependent threshold, the averaged GD iterate achieves a risk upper bound of exp(-$\Theta$($\eta$t)), where $\eta$can be chosen arbitrarily large. This implies that GD attains arbitrarily fast convergence rates via large stepsizes, although the risk evolution might not be monotonic. In contrast, prior adaptive stepsize GD analyses require a monotonic risk decrease, limiting their rates to exp(-$\Theta$(t)). We further establish a margin-dependent lower bound on the iteration complexity for any first-order method to attain a small risk, justifying the necessity of the burn-in phase in our analysis. Our results generalize to a broad class of loss functions and two-layer networks under additional assumptions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhang25cf, title = {Gradient Descent Converges Arbitrarily Fast for Logistic Regression via Large and Adaptive Stepsizes}, author = {Zhang, Ruiqi and Wu, Jingfeng and Bartlett, Peter}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {76361--76384}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25cf/zhang25cf.pdf}, url = {https://proceedings.mlr.press/v267/zhang25cf.html}, abstract = {We analyze the convergence of gradient descent (GD) with large, adaptive stepsizes for logistic regression on linearly separable data. The stepsize adapts to the current risk, scaled by a fixed base stepsize $\eta$. We prove that once the number of iterates t surpasses a margin-dependent threshold, the averaged GD iterate achieves a risk upper bound of exp(-$\Theta$($\eta$t)), where $\eta$can be chosen arbitrarily large. This implies that GD attains arbitrarily fast convergence rates via large stepsizes, although the risk evolution might not be monotonic. In contrast, prior adaptive stepsize GD analyses require a monotonic risk decrease, limiting their rates to exp(-$\Theta$(t)). We further establish a margin-dependent lower bound on the iteration complexity for any first-order method to attain a small risk, justifying the necessity of the burn-in phase in our analysis. Our results generalize to a broad class of loss functions and two-layer networks under additional assumptions.} }
Endnote
%0 Conference Paper %T Gradient Descent Converges Arbitrarily Fast for Logistic Regression via Large and Adaptive Stepsizes %A Ruiqi Zhang %A Jingfeng Wu %A Peter Bartlett %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhang25cf %I PMLR %P 76361--76384 %U https://proceedings.mlr.press/v267/zhang25cf.html %V 267 %X We analyze the convergence of gradient descent (GD) with large, adaptive stepsizes for logistic regression on linearly separable data. The stepsize adapts to the current risk, scaled by a fixed base stepsize $\eta$. We prove that once the number of iterates t surpasses a margin-dependent threshold, the averaged GD iterate achieves a risk upper bound of exp(-$\Theta$($\eta$t)), where $\eta$can be chosen arbitrarily large. This implies that GD attains arbitrarily fast convergence rates via large stepsizes, although the risk evolution might not be monotonic. In contrast, prior adaptive stepsize GD analyses require a monotonic risk decrease, limiting their rates to exp(-$\Theta$(t)). We further establish a margin-dependent lower bound on the iteration complexity for any first-order method to attain a small risk, justifying the necessity of the burn-in phase in our analysis. Our results generalize to a broad class of loss functions and two-layer networks under additional assumptions.
APA
Zhang, R., Wu, J. & Bartlett, P.. (2025). Gradient Descent Converges Arbitrarily Fast for Logistic Regression via Large and Adaptive Stepsizes. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:76361-76384 Available from https://proceedings.mlr.press/v267/zhang25cf.html.

Related Material