Data-Dependent Stability of Stochastic Gradient Descent

Ilja Kuzborskij; Christoph Lampert

Data-Dependent Stability of Stochastic Gradient Descent

Ilja Kuzborskij, Christoph Lampert

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2815-2824, 2018.

Abstract

We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-kuzborskij18a,
  title = 	 {Data-Dependent Stability of Stochastic Gradient Descent},
  author =       {Kuzborskij, Ilja and Lampert, Christoph},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {2815--2824},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/kuzborskij18a/kuzborskij18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/kuzborskij18a.html},
  abstract = 	 {We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.}
}

Endnote

%0 Conference Paper
%T Data-Dependent Stability of Stochastic Gradient Descent
%A Ilja Kuzborskij
%A Christoph Lampert
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-kuzborskij18a
%I PMLR
%P 2815--2824
%U https://proceedings.mlr.press/v80/kuzborskij18a.html
%V 80
%X We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

APA

Kuzborskij, I. & Lampert, C.. (2018). Data-Dependent Stability of Stochastic Gradient Descent. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2815-2824 Available from https://proceedings.mlr.press/v80/kuzborskij18a.html.

Data-Dependent Stability of Stochastic Gradient Descent

Abstract

Cite this Paper

Related Material