Path-BN: Towards effective batch normalization in the Path Space for ReLU networks

Xufang Luo, Qi Meng, Wei Chen, Yunhong Wang, Tie-Yan Liu
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:834-843, 2021.

Abstract

Neural networks with ReLU activation functions (abbrev. ReLU Networks), have demonstrated their success in many applications. Recently, researchers noticed that ReLU networks are positively scale-invariant (PSI) while the weights are not. This mismatch may lead to undesirable behaviors in the optimization process. Hence, some new algorithms that conduct optimization directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space. %nd it was shown that, SGD in the path space is superior to that in the weight space. However, it is still unknown that whether other deep learning techniques such as batch normalization (BN), could also have their counterparts in the path space. In this paper, we conduct a formal study on the design of BN in the path space. First, we propose path-reparameterization of ReLU networks, in which the weights in the networks are reparameterized by path-values. Then, the feedforward and backward propagation of the path-reparameterized networks can calculate the values of the hidden nodes and the gradients in the path space, respectively. Next, we design the a novel way to do batch normalization for the path-reparameterized ReLU networks, called Path-BN. Specifically, we notice that, path-reparameterized ReLU NNs have a portion of constant weights which play more critical roles to form the basis of the path space. We propose to exclude these constant weights when doing batch normalization and prove that, by doing so, the scale and the direction of the trained parameters can be more effectively decoupled during training. Finally, we conduct experiments on benchmark datasets. The results show that our proposed Path-BN can improve the performance of the optimization algorithms in the path space.

Cite this Paper


BibTeX
@InProceedings{pmlr-v161-luo21b, title = {Path-BN: Towards effective batch normalization in the Path Space for ReLU networks}, author = {Luo, Xufang and Meng, Qi and Chen, Wei and Wang, Yunhong and Liu, Tie-Yan}, booktitle = {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence}, pages = {834--843}, year = {2021}, editor = {de Campos, Cassio and Maathuis, Marloes H.}, volume = {161}, series = {Proceedings of Machine Learning Research}, month = {27--30 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v161/luo21b/luo21b.pdf}, url = {https://proceedings.mlr.press/v161/luo21b.html}, abstract = {Neural networks with ReLU activation functions (abbrev. ReLU Networks), have demonstrated their success in many applications. Recently, researchers noticed that ReLU networks are positively scale-invariant (PSI) while the weights are not. This mismatch may lead to undesirable behaviors in the optimization process. Hence, some new algorithms that conduct optimization directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space. %nd it was shown that, SGD in the path space is superior to that in the weight space. However, it is still unknown that whether other deep learning techniques such as batch normalization (BN), could also have their counterparts in the path space. In this paper, we conduct a formal study on the design of BN in the path space. First, we propose path-reparameterization of ReLU networks, in which the weights in the networks are reparameterized by path-values. Then, the feedforward and backward propagation of the path-reparameterized networks can calculate the values of the hidden nodes and the gradients in the path space, respectively. Next, we design the a novel way to do batch normalization for the path-reparameterized ReLU networks, called Path-BN. Specifically, we notice that, path-reparameterized ReLU NNs have a portion of constant weights which play more critical roles to form the basis of the path space. We propose to exclude these constant weights when doing batch normalization and prove that, by doing so, the scale and the direction of the trained parameters can be more effectively decoupled during training. Finally, we conduct experiments on benchmark datasets. The results show that our proposed Path-BN can improve the performance of the optimization algorithms in the path space.} }
Endnote
%0 Conference Paper %T Path-BN: Towards effective batch normalization in the Path Space for ReLU networks %A Xufang Luo %A Qi Meng %A Wei Chen %A Yunhong Wang %A Tie-Yan Liu %B Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2021 %E Cassio de Campos %E Marloes H. Maathuis %F pmlr-v161-luo21b %I PMLR %P 834--843 %U https://proceedings.mlr.press/v161/luo21b.html %V 161 %X Neural networks with ReLU activation functions (abbrev. ReLU Networks), have demonstrated their success in many applications. Recently, researchers noticed that ReLU networks are positively scale-invariant (PSI) while the weights are not. This mismatch may lead to undesirable behaviors in the optimization process. Hence, some new algorithms that conduct optimization directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space. %nd it was shown that, SGD in the path space is superior to that in the weight space. However, it is still unknown that whether other deep learning techniques such as batch normalization (BN), could also have their counterparts in the path space. In this paper, we conduct a formal study on the design of BN in the path space. First, we propose path-reparameterization of ReLU networks, in which the weights in the networks are reparameterized by path-values. Then, the feedforward and backward propagation of the path-reparameterized networks can calculate the values of the hidden nodes and the gradients in the path space, respectively. Next, we design the a novel way to do batch normalization for the path-reparameterized ReLU networks, called Path-BN. Specifically, we notice that, path-reparameterized ReLU NNs have a portion of constant weights which play more critical roles to form the basis of the path space. We propose to exclude these constant weights when doing batch normalization and prove that, by doing so, the scale and the direction of the trained parameters can be more effectively decoupled during training. Finally, we conduct experiments on benchmark datasets. The results show that our proposed Path-BN can improve the performance of the optimization algorithms in the path space.
APA
Luo, X., Meng, Q., Chen, W., Wang, Y. & Liu, T.. (2021). Path-BN: Towards effective batch normalization in the Path Space for ReLU networks. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 161:834-843 Available from https://proceedings.mlr.press/v161/luo21b.html.

Related Material