Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

Liam Hodgkinson, Umut Simsekli, Rajiv Khanna, Michael Mahoney
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8774-8795, 2022.

Abstract

Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, they mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique-Talagrand functional applied to the trajectory of the optimizer. Then, we specialize this result by exploiting the Markovian structure of stochastic optimizers, and derive bounds in terms of their (data-dependent) transition kernels. We support our theory with empirical results from a variety of neural networks, showing correlations between generalization error and lower tail exponents.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-hodgkinson22a, title = {Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers}, author = {Hodgkinson, Liam and Simsekli, Umut and Khanna, Rajiv and Mahoney, Michael}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {8774--8795}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/hodgkinson22a/hodgkinson22a.pdf}, url = {https://proceedings.mlr.press/v162/hodgkinson22a.html}, abstract = {Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, they mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique-Talagrand functional applied to the trajectory of the optimizer. Then, we specialize this result by exploiting the Markovian structure of stochastic optimizers, and derive bounds in terms of their (data-dependent) transition kernels. We support our theory with empirical results from a variety of neural networks, showing correlations between generalization error and lower tail exponents.} }
Endnote
%0 Conference Paper %T Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers %A Liam Hodgkinson %A Umut Simsekli %A Rajiv Khanna %A Michael Mahoney %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-hodgkinson22a %I PMLR %P 8774--8795 %U https://proceedings.mlr.press/v162/hodgkinson22a.html %V 162 %X Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, they mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique-Talagrand functional applied to the trajectory of the optimizer. Then, we specialize this result by exploiting the Markovian structure of stochastic optimizers, and derive bounds in terms of their (data-dependent) transition kernels. We support our theory with empirical results from a variety of neural networks, showing correlations between generalization error and lower tail exponents.
APA
Hodgkinson, L., Simsekli, U., Khanna, R. & Mahoney, M.. (2022). Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:8774-8795 Available from https://proceedings.mlr.press/v162/hodgkinson22a.html.

Related Material