On the Random Conjugate Kernel and Neural Tangent Kernel

Zhengmian Hu, Heng Huang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:4359-4368, 2021.

Abstract

We investigate the distributions of Conjugate Kernel (CK) and Neural Tangent Kernel (NTK) for ReLU networks with random initialization. We derive the precise distributions and moments of the diagonal elements of these kernels. For a feedforward network, these values converge in law to a log-normal distribution when the network depth $d$ and width $n$ simultaneously tend to infinity and the variance of log diagonal elements is proportional to ${d}/{n}$. For the residual network, in the limit that number of branches $m$ increases to infinity and the width $n$ remains fixed, the diagonal elements of Conjugate Kernel converge in law to a log-normal distribution where the variance of log value is proportional to ${1}/{n}$, and the diagonal elements of NTK converge in law to a log-normal distributed variable times the conjugate kernel of one feedforward network. Our new theoretical analysis results suggest that residual network remains trainable in the limit of infinite branches and fixed network width. The numerical experiments are conducted and all results validate the soundness of our theoretical analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-hu21b, title = {On the Random Conjugate Kernel and Neural Tangent Kernel}, author = {Hu, Zhengmian and Huang, Heng}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {4359--4368}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/hu21b/hu21b.pdf}, url = {https://proceedings.mlr.press/v139/hu21b.html}, abstract = {We investigate the distributions of Conjugate Kernel (CK) and Neural Tangent Kernel (NTK) for ReLU networks with random initialization. We derive the precise distributions and moments of the diagonal elements of these kernels. For a feedforward network, these values converge in law to a log-normal distribution when the network depth $d$ and width $n$ simultaneously tend to infinity and the variance of log diagonal elements is proportional to ${d}/{n}$. For the residual network, in the limit that number of branches $m$ increases to infinity and the width $n$ remains fixed, the diagonal elements of Conjugate Kernel converge in law to a log-normal distribution where the variance of log value is proportional to ${1}/{n}$, and the diagonal elements of NTK converge in law to a log-normal distributed variable times the conjugate kernel of one feedforward network. Our new theoretical analysis results suggest that residual network remains trainable in the limit of infinite branches and fixed network width. The numerical experiments are conducted and all results validate the soundness of our theoretical analysis.} }
Endnote
%0 Conference Paper %T On the Random Conjugate Kernel and Neural Tangent Kernel %A Zhengmian Hu %A Heng Huang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-hu21b %I PMLR %P 4359--4368 %U https://proceedings.mlr.press/v139/hu21b.html %V 139 %X We investigate the distributions of Conjugate Kernel (CK) and Neural Tangent Kernel (NTK) for ReLU networks with random initialization. We derive the precise distributions and moments of the diagonal elements of these kernels. For a feedforward network, these values converge in law to a log-normal distribution when the network depth $d$ and width $n$ simultaneously tend to infinity and the variance of log diagonal elements is proportional to ${d}/{n}$. For the residual network, in the limit that number of branches $m$ increases to infinity and the width $n$ remains fixed, the diagonal elements of Conjugate Kernel converge in law to a log-normal distribution where the variance of log value is proportional to ${1}/{n}$, and the diagonal elements of NTK converge in law to a log-normal distributed variable times the conjugate kernel of one feedforward network. Our new theoretical analysis results suggest that residual network remains trainable in the limit of infinite branches and fixed network width. The numerical experiments are conducted and all results validate the soundness of our theoretical analysis.
APA
Hu, Z. & Huang, H.. (2021). On the Random Conjugate Kernel and Neural Tangent Kernel. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:4359-4368 Available from https://proceedings.mlr.press/v139/hu21b.html.

Related Material