Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization

Xiangru Lian, Ji Liu
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:3254-3263, 2019.

Abstract

Batch Normalization (BN) has been used extensively in deep learning to achieve faster training process and better resulting models. However, whether BN works strongly depends on how the batches are constructed during training, and it may not converge to a desired solution if the statistics on the batch are not close to the statistics over the whole dataset. In this paper, we try to understand BN from an optimization perspective by providing an explicit objective function associated with BN. This explicit objective function reveals that: 1) BN, rather than being a new optimization algorithm or trick, is creating a different objective function instead of the one in our common sense; and 2) why BN may not work well in some scenarios. We then propose a refinement of BN based on the compositional optimization technique called Full Normalization (FN) to alleviate the issues of BN when the batches are not constructed ideally. The convergence analysis and empirical study for FN are also included in this paper.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-lian19a, title = {Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization}, author = {Lian, Xiangru and Liu, Ji}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {3254--3263}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/lian19a/lian19a.pdf}, url = {https://proceedings.mlr.press/v89/lian19a.html}, abstract = {Batch Normalization (BN) has been used extensively in deep learning to achieve faster training process and better resulting models. However, whether BN works strongly depends on how the batches are constructed during training, and it may not converge to a desired solution if the statistics on the batch are not close to the statistics over the whole dataset. In this paper, we try to understand BN from an optimization perspective by providing an explicit objective function associated with BN. This explicit objective function reveals that: 1) BN, rather than being a new optimization algorithm or trick, is creating a different objective function instead of the one in our common sense; and 2) why BN may not work well in some scenarios. We then propose a refinement of BN based on the compositional optimization technique called Full Normalization (FN) to alleviate the issues of BN when the batches are not constructed ideally. The convergence analysis and empirical study for FN are also included in this paper.} }
Endnote
%0 Conference Paper %T Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization %A Xiangru Lian %A Ji Liu %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-lian19a %I PMLR %P 3254--3263 %U https://proceedings.mlr.press/v89/lian19a.html %V 89 %X Batch Normalization (BN) has been used extensively in deep learning to achieve faster training process and better resulting models. However, whether BN works strongly depends on how the batches are constructed during training, and it may not converge to a desired solution if the statistics on the batch are not close to the statistics over the whole dataset. In this paper, we try to understand BN from an optimization perspective by providing an explicit objective function associated with BN. This explicit objective function reveals that: 1) BN, rather than being a new optimization algorithm or trick, is creating a different objective function instead of the one in our common sense; and 2) why BN may not work well in some scenarios. We then propose a refinement of BN based on the compositional optimization technique called Full Normalization (FN) to alleviate the issues of BN when the batches are not constructed ideally. The convergence analysis and empirical study for FN are also included in this paper.
APA
Lian, X. & Liu, J.. (2019). Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:3254-3263 Available from https://proceedings.mlr.press/v89/lian19a.html.

Related Material