Generalization Bounds for Label Noise Stochastic Gradient Descent

Jung Eun Huh; Patrick Rebeschini

Generalization Bounds for Label Noise Stochastic Gradient Descent

Jung Eun Huh, Patrick Rebeschini

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1360-1368, 2024.

Abstract

We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension

$d$ . Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with

$d$ and with the rate of

$n^{-2/3}$ , where

$n$ is the sample size. This rate is better than the best-known rate of

$n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD)—which employs parameter-independent Gaussian noise—under similar conditions. Our analysis offers quantitative insights into the effect of label noise.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-eun-huh24a,
  title = 	 {Generalization Bounds for Label Noise Stochastic Gradient Descent},
  author =       {Eun Huh, Jung and Rebeschini, Patrick},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1360--1368},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/eun-huh24a/eun-huh24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/eun-huh24a.html},
  abstract = 	 {We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD)—which employs parameter-independent Gaussian noise—under similar conditions. Our analysis offers quantitative insights into the effect of label noise.}
}

Endnote

%0 Conference Paper
%T Generalization Bounds for Label Noise Stochastic Gradient Descent
%A Jung Eun Huh
%A Patrick Rebeschini
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-eun-huh24a
%I PMLR
%P 1360--1368
%U https://proceedings.mlr.press/v238/eun-huh24a.html
%V 238
%X We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD)—which employs parameter-independent Gaussian noise—under similar conditions. Our analysis offers quantitative insights into the effect of label noise.

APA

Eun Huh, J. & Rebeschini, P.. (2024). Generalization Bounds for Label Noise Stochastic Gradient Descent. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1360-1368 Available from https://proceedings.mlr.press/v238/eun-huh24a.html.

Related Material

Download PDF