[edit]
CDF Normalization for Controlling the Distribution of Hidden Nodes
Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops, PMLR 163:64-68, 2022.
Abstract
Batch Normalizaiton (BN) is a normalization method for deep neural networks that has been shown to accelerate training. While the effectiveness of BN is undisputed, the explanation of its effectiveness is still being studied. The original BN paper attributes the success of BN to reducing internal covariate shift, so we take this a step further and explicitly enforce a Gaussian distribution on hidden layer activations. This approach proves to be ineffective, demonstrating further that reducing internal covariate shift is not important for successful layer normalization.