Generalization of noisy SGD in unbounded non-convex settings

Leello Tadesse Dadi, Volkan Cevher
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:11862-11883, 2025.

Abstract

We study generalization of iterative noisy gradient schemes on smooth non-convex losses. Formally, we establish time-independent information theoretic generalization bounds for Stochastic Gradient Langevin Dynamics (SGLD) that do not diverge as the iteration count increases. Our bounds are obtained through a stability argument: we analyze the difference between two SGLD sequences ran in parallel on two datasets sampled from the same distribution. Our result only requires an isoperimetric inequality to hold, which is merely a restriction on the tails of the loss. Our work relaxes the assumptions of prior work to establish that the iterates stay within a bounded KL divergence from each other. Under an additional dissipativity assumption, we show that the stronger Renyi divergence also stays bounded by establishing a uniform log-Sobolev constant of the iterates. Without dissipativity, we sidestep the need for local log-Sobolev inequalities and instead exploit the regularizing properties of Gaussian convolution. These techniques allow us to show that strong convexity is not necessary for finite stability bounds. Our work shows that noisy SGD can have finite, iteration-independent, generalization and differential privacy bounds in unbounded non-convex settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-dadi25a, title = {Generalization of noisy {SGD} in unbounded non-convex settings}, author = {Dadi, Leello Tadesse and Cevher, Volkan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {11862--11883}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/dadi25a/dadi25a.pdf}, url = {https://proceedings.mlr.press/v267/dadi25a.html}, abstract = {We study generalization of iterative noisy gradient schemes on smooth non-convex losses. Formally, we establish time-independent information theoretic generalization bounds for Stochastic Gradient Langevin Dynamics (SGLD) that do not diverge as the iteration count increases. Our bounds are obtained through a stability argument: we analyze the difference between two SGLD sequences ran in parallel on two datasets sampled from the same distribution. Our result only requires an isoperimetric inequality to hold, which is merely a restriction on the tails of the loss. Our work relaxes the assumptions of prior work to establish that the iterates stay within a bounded KL divergence from each other. Under an additional dissipativity assumption, we show that the stronger Renyi divergence also stays bounded by establishing a uniform log-Sobolev constant of the iterates. Without dissipativity, we sidestep the need for local log-Sobolev inequalities and instead exploit the regularizing properties of Gaussian convolution. These techniques allow us to show that strong convexity is not necessary for finite stability bounds. Our work shows that noisy SGD can have finite, iteration-independent, generalization and differential privacy bounds in unbounded non-convex settings.} }
Endnote
%0 Conference Paper %T Generalization of noisy SGD in unbounded non-convex settings %A Leello Tadesse Dadi %A Volkan Cevher %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-dadi25a %I PMLR %P 11862--11883 %U https://proceedings.mlr.press/v267/dadi25a.html %V 267 %X We study generalization of iterative noisy gradient schemes on smooth non-convex losses. Formally, we establish time-independent information theoretic generalization bounds for Stochastic Gradient Langevin Dynamics (SGLD) that do not diverge as the iteration count increases. Our bounds are obtained through a stability argument: we analyze the difference between two SGLD sequences ran in parallel on two datasets sampled from the same distribution. Our result only requires an isoperimetric inequality to hold, which is merely a restriction on the tails of the loss. Our work relaxes the assumptions of prior work to establish that the iterates stay within a bounded KL divergence from each other. Under an additional dissipativity assumption, we show that the stronger Renyi divergence also stays bounded by establishing a uniform log-Sobolev constant of the iterates. Without dissipativity, we sidestep the need for local log-Sobolev inequalities and instead exploit the regularizing properties of Gaussian convolution. These techniques allow us to show that strong convexity is not necessary for finite stability bounds. Our work shows that noisy SGD can have finite, iteration-independent, generalization and differential privacy bounds in unbounded non-convex settings.
APA
Dadi, L.T. & Cevher, V.. (2025). Generalization of noisy SGD in unbounded non-convex settings. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:11862-11883 Available from https://proceedings.mlr.press/v267/dadi25a.html.

Related Material