On the Impact of the Activation function on Deep Neural Networks Training

Soufiane Hayou, Arnaud Doucet, Judith Rousseau
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2672-2680, 2019.

Abstract

The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the ‘Edge of Chaos’ can lead to good performance. While the work by Samuel et al. (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-hayou19a, title = {On the Impact of the Activation function on Deep Neural Networks Training}, author = {Hayou, Soufiane and Doucet, Arnaud and Rousseau, Judith}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2672--2680}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/hayou19a/hayou19a.pdf}, url = {https://proceedings.mlr.press/v97/hayou19a.html}, abstract = {The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the ‘Edge of Chaos’ can lead to good performance. While the work by Samuel et al. (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.} }
Endnote
%0 Conference Paper %T On the Impact of the Activation function on Deep Neural Networks Training %A Soufiane Hayou %A Arnaud Doucet %A Judith Rousseau %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-hayou19a %I PMLR %P 2672--2680 %U https://proceedings.mlr.press/v97/hayou19a.html %V 97 %X The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the ‘Edge of Chaos’ can lead to good performance. While the work by Samuel et al. (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.
APA
Hayou, S., Doucet, A. & Rousseau, J.. (2019). On the Impact of the Activation function on Deep Neural Networks Training. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2672-2680 Available from https://proceedings.mlr.press/v97/hayou19a.html.

Related Material