Data-Dependent Stability of Stochastic Gradient Descent

Ilja Kuzborskij, Christoph Lampert
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2815-2824, 2018.

Abstract

We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-kuzborskij18a, title = {Data-Dependent Stability of Stochastic Gradient Descent}, author = {Kuzborskij, Ilja and Lampert, Christoph}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2815--2824}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/kuzborskij18a/kuzborskij18a.pdf}, url = {http://proceedings.mlr.press/v80/kuzborskij18a.html}, abstract = {We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.} }
Endnote
%0 Conference Paper %T Data-Dependent Stability of Stochastic Gradient Descent %A Ilja Kuzborskij %A Christoph Lampert %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-kuzborskij18a %I PMLR %P 2815--2824 %U http://proceedings.mlr.press/v80/kuzborskij18a.html %V 80 %X We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.
APA
Kuzborskij, I. & Lampert, C.. (2018). Data-Dependent Stability of Stochastic Gradient Descent. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2815-2824 Available from http://proceedings.mlr.press/v80/kuzborskij18a.html.

Related Material