Implicit Jacobian regularization weighted with impurity of probability output

Sungyoon Lee; Jinseong Park; Jaewook Lee

Implicit Jacobian regularization weighted with impurity of probability output

Sungyoon Lee, Jinseong Park, Jaewook Lee

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:19141-19184, 2023.

Abstract

The success of deep learning is greatly attributed to stochastic gradient descent (SGD), yet it remains unclear how SGD finds well-generalized models. We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks. This regularization effect is weighted with the impurity of the probability output, and thus it is active in a certain phase of training. Moreover, based on these findings, we propose a novel optimization method that explicitly regularizes the Jacobian norm, which leads to similar performance as other state-of-the-art sharpness-aware optimization methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-lee23q,
  title = 	 {Implicit {J}acobian regularization weighted with impurity of probability output},
  author =       {Lee, Sungyoon and Park, Jinseong and Lee, Jaewook},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {19141--19184},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/lee23q/lee23q.pdf},
  url = 	 {https://proceedings.mlr.press/v202/lee23q.html},
  abstract = 	 {The success of deep learning is greatly attributed to stochastic gradient descent (SGD), yet it remains unclear how SGD finds well-generalized models. We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks. This regularization effect is weighted with the impurity of the probability output, and thus it is active in a certain phase of training. Moreover, based on these findings, we propose a novel optimization method that explicitly regularizes the Jacobian norm, which leads to similar performance as other state-of-the-art sharpness-aware optimization methods.}
}

Endnote

%0 Conference Paper
%T Implicit Jacobian regularization weighted with impurity of probability output
%A Sungyoon Lee
%A Jinseong Park
%A Jaewook Lee
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-lee23q
%I PMLR
%P 19141--19184
%U https://proceedings.mlr.press/v202/lee23q.html
%V 202
%X The success of deep learning is greatly attributed to stochastic gradient descent (SGD), yet it remains unclear how SGD finds well-generalized models. We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks. This regularization effect is weighted with the impurity of the probability output, and thus it is active in a certain phase of training. Moreover, based on these findings, we propose a novel optimization method that explicitly regularizes the Jacobian norm, which leads to similar performance as other state-of-the-art sharpness-aware optimization methods.

APA


Lee, S., Park, J. & Lee, J.. (2023). Implicit Jacobian regularization weighted with impurity of probability output. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:19141-19184 Available from https://proceedings.mlr.press/v202/lee23q.html.

Implicit Jacobian regularization weighted with impurity of probability output

Abstract

Cite this Paper

Related Material