On the interplay between noise and curvature and its effect on optimization and generalization

Valentin Thomas, Fabian Pedregosa, Bart Merriënboer, Pierre-Antoine Manzagol, Yoshua Bengio, Nicolas Le Roux
; Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3503-3513, 2020.

Abstract

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the variance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-thomas20a, title = {On the interplay between noise and curvature and its effect on optimization and generalization}, author = {Thomas, Valentin and Pedregosa, Fabian and van Merri\"enboer, Bart and Manzagol, Pierre-Antoine and Bengio, Yoshua and Roux, Nicolas Le}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {3503--3513}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, address = {Online}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/thomas20a/thomas20a.pdf}, url = {http://proceedings.mlr.press/v108/thomas20a.html}, abstract = { The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the variance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.} }
Endnote
%0 Conference Paper %T On the interplay between noise and curvature and its effect on optimization and generalization %A Valentin Thomas %A Fabian Pedregosa %A Bart Merriënboer %A Pierre-Antoine Manzagol %A Yoshua Bengio %A Nicolas Le Roux %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-thomas20a %I PMLR %J Proceedings of Machine Learning Research %P 3503--3513 %U http://proceedings.mlr.press %V 108 %W PMLR %X The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the variance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.
APA
Thomas, V., Pedregosa, F., Merriënboer, B., Manzagol, P., Bengio, Y. & Roux, N.L.. (2020). On the interplay between noise and curvature and its effect on optimization and generalization. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in PMLR 108:3503-3513

Related Material