The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

Yuan Cao; Difan Zou; Yuanzhi Li; Quanquan Gu

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

Yuan Cao, Difan Zou, Yuanzhi Li, Quanquan Gu

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:5699-5753, 2023.

Abstract

We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an

$\exp(-\Omega(\log^2t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We then further extend our result to a class of two-layer, single-filter convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-cao23a,
  title = 	 {The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks},
  author =       {Cao, Yuan and Zou, Difan and Li, Yuanzhi and Gu, Quanquan},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {5699--5753},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/cao23a/cao23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/cao23a.html},
  abstract = 	 {We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-\Omega(\log^2t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We then further extend our result to a class of two-layer, single-filter convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.}
}

Endnote

%0 Conference Paper
%T The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
%A Yuan Cao
%A Difan Zou
%A Yuanzhi Li
%A Quanquan Gu
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-cao23a
%I PMLR
%P 5699--5753
%U https://proceedings.mlr.press/v195/cao23a.html
%V 195
%X We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-\Omega(\log^2t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We then further extend our result to a class of two-layer, single-filter convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.

APA


Cao, Y., Zou, D., Li, Y. & Gu, Q.. (2023). The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:5699-5753 Available from https://proceedings.mlr.press/v195/cao23a.html.

Related Material

Download PDF