Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:3254-3263, 2019.
Batch Normalization (BN) has been used extensively in deep learning to achieve faster training process and better resulting models. However, whether BN works strongly depends on how the batches are constructed during training, and it may not converge to a desired solution if the statistics on the batch are not close to the statistics over the whole dataset. In this paper, we try to understand BN from an optimization perspective by providing an explicit objective function associated with BN. This explicit objective function reveals that: 1) BN, rather than being a new optimization algorithm or trick, is creating a different objective function instead of the one in our common sense; and 2) why BN may not work well in some scenarios. We then propose a refinement of BN based on the compositional optimization technique called Full Normalization (FN) to alleviate the issues of BN when the batches are not constructed ideally. The convergence analysis and empirical study for FN are also included in this paper.