Generalization Lower Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Peiyuan Zhang, Jiaye Teng, Jingzhao Zhang
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:334-342, 2025.

Abstract

This work studies the generalization error of gradient methods. More specifically, we focus on how training steps $T$ and step-size $\eta$ might affect generalization in smooth stochastic convex optimization (SCO) problems. Recent works show that in some cases longer training can hurt generalization. Our work reexamines this for smooth SCO and find that the conclusion can be case-dependent. In particular, we first study SCO problems when the loss is \emph{realizable}, i.e. an optimal solution minimizes all the data points. Our work provides excess risk lower bounds for Gradient Descent (GD) and Stochastic Gradient Descent (SGD) and finds that longer training may not hurt generalization. In the short training scenario $\eta T = O(n)$ ($n$ is sample size), our lower bounds tightly match and certify the respective upper bounds. However, for the long training scenario where $\eta T =O(n)$, our analysis reveals a gap between the lower and upper bounds, indicating that longer training does hurt generalization for realizable objectives. A conjecture is proposed that the gap can be closed by improving upper bounds, supported by analyses in two special instances. Moreover, besides the realizable setup, we also provide first tight excess risk lower bounds for GD and SGD under the general non-realizable smooth SCO setting, suggesting that existing stability analyses are tight in step-size and iteration dependence, and that overfitting provably happens when there is no interpolating minimum.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-zhang25a, title = {Generalization Lower Bounds for GD and SGD in Smooth Stochastic Convex Optimization}, author = {Zhang, Peiyuan and Teng, Jiaye and Zhang, Jingzhao}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {334--342}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/zhang25a/zhang25a.pdf}, url = {https://proceedings.mlr.press/v258/zhang25a.html}, abstract = {This work studies the generalization error of gradient methods. More specifically, we focus on how training steps $T$ and step-size $\eta$ might affect generalization in smooth stochastic convex optimization (SCO) problems. Recent works show that in some cases longer training can hurt generalization. Our work reexamines this for smooth SCO and find that the conclusion can be case-dependent. In particular, we first study SCO problems when the loss is \emph{realizable}, i.e. an optimal solution minimizes all the data points. Our work provides excess risk lower bounds for Gradient Descent (GD) and Stochastic Gradient Descent (SGD) and finds that longer training may not hurt generalization. In the short training scenario $\eta T = O(n)$ ($n$ is sample size), our lower bounds tightly match and certify the respective upper bounds. However, for the long training scenario where $\eta T =O(n)$, our analysis reveals a gap between the lower and upper bounds, indicating that longer training does hurt generalization for realizable objectives. A conjecture is proposed that the gap can be closed by improving upper bounds, supported by analyses in two special instances. Moreover, besides the realizable setup, we also provide first tight excess risk lower bounds for GD and SGD under the general non-realizable smooth SCO setting, suggesting that existing stability analyses are tight in step-size and iteration dependence, and that overfitting provably happens when there is no interpolating minimum.} }
Endnote
%0 Conference Paper %T Generalization Lower Bounds for GD and SGD in Smooth Stochastic Convex Optimization %A Peiyuan Zhang %A Jiaye Teng %A Jingzhao Zhang %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-zhang25a %I PMLR %P 334--342 %U https://proceedings.mlr.press/v258/zhang25a.html %V 258 %X This work studies the generalization error of gradient methods. More specifically, we focus on how training steps $T$ and step-size $\eta$ might affect generalization in smooth stochastic convex optimization (SCO) problems. Recent works show that in some cases longer training can hurt generalization. Our work reexamines this for smooth SCO and find that the conclusion can be case-dependent. In particular, we first study SCO problems when the loss is \emph{realizable}, i.e. an optimal solution minimizes all the data points. Our work provides excess risk lower bounds for Gradient Descent (GD) and Stochastic Gradient Descent (SGD) and finds that longer training may not hurt generalization. In the short training scenario $\eta T = O(n)$ ($n$ is sample size), our lower bounds tightly match and certify the respective upper bounds. However, for the long training scenario where $\eta T =O(n)$, our analysis reveals a gap between the lower and upper bounds, indicating that longer training does hurt generalization for realizable objectives. A conjecture is proposed that the gap can be closed by improving upper bounds, supported by analyses in two special instances. Moreover, besides the realizable setup, we also provide first tight excess risk lower bounds for GD and SGD under the general non-realizable smooth SCO setting, suggesting that existing stability analyses are tight in step-size and iteration dependence, and that overfitting provably happens when there is no interpolating minimum.
APA
Zhang, P., Teng, J. & Zhang, J.. (2025). Generalization Lower Bounds for GD and SGD in Smooth Stochastic Convex Optimization. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:334-342 Available from https://proceedings.mlr.press/v258/zhang25a.html.

Related Material