Implicit Jacobian regularization weighted with impurity of probability output

Sungyoon Lee, Jinseong Park, Jaewook Lee
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:19141-19184, 2023.

Abstract

The success of deep learning is greatly attributed to stochastic gradient descent (SGD), yet it remains unclear how SGD finds well-generalized models. We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks. This regularization effect is weighted with the impurity of the probability output, and thus it is active in a certain phase of training. Moreover, based on these findings, we propose a novel optimization method that explicitly regularizes the Jacobian norm, which leads to similar performance as other state-of-the-art sharpness-aware optimization methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-lee23q, title = {Implicit {J}acobian regularization weighted with impurity of probability output}, author = {Lee, Sungyoon and Park, Jinseong and Lee, Jaewook}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {19141--19184}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/lee23q/lee23q.pdf}, url = {https://proceedings.mlr.press/v202/lee23q.html}, abstract = {The success of deep learning is greatly attributed to stochastic gradient descent (SGD), yet it remains unclear how SGD finds well-generalized models. We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks. This regularization effect is weighted with the impurity of the probability output, and thus it is active in a certain phase of training. Moreover, based on these findings, we propose a novel optimization method that explicitly regularizes the Jacobian norm, which leads to similar performance as other state-of-the-art sharpness-aware optimization methods.} }
Endnote
%0 Conference Paper %T Implicit Jacobian regularization weighted with impurity of probability output %A Sungyoon Lee %A Jinseong Park %A Jaewook Lee %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-lee23q %I PMLR %P 19141--19184 %U https://proceedings.mlr.press/v202/lee23q.html %V 202 %X The success of deep learning is greatly attributed to stochastic gradient descent (SGD), yet it remains unclear how SGD finds well-generalized models. We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks. This regularization effect is weighted with the impurity of the probability output, and thus it is active in a certain phase of training. Moreover, based on these findings, we propose a novel optimization method that explicitly regularizes the Jacobian norm, which leads to similar performance as other state-of-the-art sharpness-aware optimization methods.
APA
Lee, S., Park, J. & Lee, J.. (2023). Implicit Jacobian regularization weighted with impurity of probability output. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:19141-19184 Available from https://proceedings.mlr.press/v202/lee23q.html.

Related Material