Generalization Bounds for Label Noise Stochastic Gradient Descent

Jung Eun Huh, Patrick Rebeschini
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1360-1368, 2024.

Abstract

We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD)—which employs parameter-independent Gaussian noise—under similar conditions. Our analysis offers quantitative insights into the effect of label noise.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-eun-huh24a, title = {Generalization Bounds for Label Noise Stochastic Gradient Descent}, author = {Eun Huh, Jung and Rebeschini, Patrick}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {1360--1368}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/eun-huh24a/eun-huh24a.pdf}, url = {https://proceedings.mlr.press/v238/eun-huh24a.html}, abstract = {We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD)—which employs parameter-independent Gaussian noise—under similar conditions. Our analysis offers quantitative insights into the effect of label noise.} }
Endnote
%0 Conference Paper %T Generalization Bounds for Label Noise Stochastic Gradient Descent %A Jung Eun Huh %A Patrick Rebeschini %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-eun-huh24a %I PMLR %P 1360--1368 %U https://proceedings.mlr.press/v238/eun-huh24a.html %V 238 %X We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD)—which employs parameter-independent Gaussian noise—under similar conditions. Our analysis offers quantitative insights into the effect of label noise.
APA
Eun Huh, J. & Rebeschini, P.. (2024). Generalization Bounds for Label Noise Stochastic Gradient Descent. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1360-1368 Available from https://proceedings.mlr.press/v238/eun-huh24a.html.

Related Material