Scaling Properties of Deep Residual Networks

Alain-Sam Cohen; Rama Cont; Alain Rossier; Renyuan Xu

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen, Rama Cont, Alain Rossier, Renyuan Xu

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2039-2048, 2021.

Abstract

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-cohen21b,
  title = 	 {Scaling Properties of Deep Residual Networks},
  author =       {Cohen, Alain-Sam and Cont, Rama and Rossier, Alain and Xu, Renyuan},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {2039--2048},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/cohen21b/cohen21b.pdf},
  url = 	 {https://proceedings.mlr.press/v139/cohen21b.html},
  abstract = 	 {Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.}
}

Endnote

%0 Conference Paper
%T Scaling Properties of Deep Residual Networks
%A Alain-Sam Cohen
%A Rama Cont
%A Alain Rossier
%A Renyuan Xu
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-cohen21b
%I PMLR
%P 2039--2048
%U https://proceedings.mlr.press/v139/cohen21b.html
%V 139
%X Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

APA

Cohen, A., Cont, R., Rossier, A. & Xu, R.. (2021). Scaling Properties of Deep Residual Networks. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2039-2048 Available from https://proceedings.mlr.press/v139/cohen21b.html.

Scaling Properties of Deep Residual Networks

Abstract

Cite this Paper

Related Material