Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Jiaoyang Huang; Horng-Tzer Yau

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Jiaoyang Huang, Horng-Tzer Yau

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4542-4551, 2020.

Abstract

The evolution of a deep neural network trained by the gradient descent in the overparametrization regime can be described by its neural tangent kernel (NTK) \cite{jacot2018neural, du2018gradient1,du2018gradient2,arora2019fine}. It was observed \cite{arora2019exact} that there is a performance gap between the kernel regression using the limiting NTK and the deep neural networks. We study the dynamic of neural networks of finite width and derive an infinite hierarchy of differential equations, the neural tangent hierarchy (NTH). We prove that the NTH hierarchy truncated at the level

$p\geq 2$ approximates the dynamic of the NTK up to arbitrary precision under certain conditions on the neural network width and the data set dimension. The assumptions needed for these approximations become weaker as

$p$ increases. Finally, NTH can be viewed as higher order extensions of NTK. In particular, the NTH truncated at

$p=2$ recovers the NTK dynamics.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-huang20l,
  title = 	 {Dynamics of Deep Neural Networks and Neural Tangent Hierarchy},
  author =       {Huang, Jiaoyang and Yau, Horng-Tzer},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {4542--4551},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/huang20l/huang20l.pdf},
  url = 	 {https://proceedings.mlr.press/v119/huang20l.html},
  abstract = 	 {The evolution of a deep neural network trained by the gradient descent in the overparametrization regime can be described by its neural tangent kernel (NTK) \cite{jacot2018neural, du2018gradient1,du2018gradient2,arora2019fine}. It was observed \cite{arora2019exact} that there is a performance gap between the kernel regression using the limiting NTK and the deep neural networks. We study the dynamic of neural networks of finite width and derive an infinite hierarchy of differential equations, the neural tangent hierarchy (NTH). We prove that the NTH hierarchy truncated at the level $p\geq 2$ approximates the dynamic of the NTK up to arbitrary precision under certain conditions on the neural network width and the data set dimension. The assumptions needed for these approximations become weaker as $p$ increases. Finally, NTH can be viewed as higher order extensions of NTK. In particular, the NTH truncated at $p=2$ recovers the NTK dynamics.}
}

Endnote

%0 Conference Paper
%T Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
%A Jiaoyang Huang
%A Horng-Tzer Yau
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-huang20l
%I PMLR
%P 4542--4551
%U https://proceedings.mlr.press/v119/huang20l.html
%V 119
%X The evolution of a deep neural network trained by the gradient descent in the overparametrization regime can be described by its neural tangent kernel (NTK) \cite{jacot2018neural, du2018gradient1,du2018gradient2,arora2019fine}. It was observed \cite{arora2019exact} that there is a performance gap between the kernel regression using the limiting NTK and the deep neural networks. We study the dynamic of neural networks of finite width and derive an infinite hierarchy of differential equations, the neural tangent hierarchy (NTH). We prove that the NTH hierarchy truncated at the level $p\geq 2$ approximates the dynamic of the NTK up to arbitrary precision under certain conditions on the neural network width and the data set dimension. The assumptions needed for these approximations become weaker as $p$ increases. Finally, NTH can be viewed as higher order extensions of NTK. In particular, the NTH truncated at $p=2$ recovers the NTK dynamics.

APA


Huang, J. & Yau, H.. (2020). Dynamics of Deep Neural Networks and Neural Tangent Hierarchy. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4542-4551 Available from https://proceedings.mlr.press/v119/huang20l.html.

Related Material

Download PDF