Gradient Descent Converges Linearly for Logistic Regression on Separable Data

Kyriakos Axiotis, Maxim Sviridenko
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:1302-1319, 2023.

Abstract

We show that running gradient descent with variable learning rate guarantees loss $f(x) ≤ 1.1 \cdot f(x^*)+\epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x$. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-axiotis23a, title = {Gradient Descent Converges Linearly for Logistic Regression on Separable Data}, author = {Axiotis, Kyriakos and Sviridenko, Maxim}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {1302--1319}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/axiotis23a/axiotis23a.pdf}, url = {https://proceedings.mlr.press/v202/axiotis23a.html}, abstract = {We show that running gradient descent with variable learning rate guarantees loss $f(x) ≤ 1.1 \cdot f(x^*)+\epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x$. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.} }
Endnote
%0 Conference Paper %T Gradient Descent Converges Linearly for Logistic Regression on Separable Data %A Kyriakos Axiotis %A Maxim Sviridenko %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-axiotis23a %I PMLR %P 1302--1319 %U https://proceedings.mlr.press/v202/axiotis23a.html %V 202 %X We show that running gradient descent with variable learning rate guarantees loss $f(x) ≤ 1.1 \cdot f(x^*)+\epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x$. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.
APA
Axiotis, K. & Sviridenko, M.. (2023). Gradient Descent Converges Linearly for Logistic Regression on Separable Data. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:1302-1319 Available from https://proceedings.mlr.press/v202/axiotis23a.html.

Related Material