On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks

Hongru Yang; Zhangyang Wang

On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks

Hongru Yang, Zhangyang Wang

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:1513-1553, 2023.

Abstract

Motivated by both theory and practice, we study how random pruning the weights affects a neural network’s neural tangent kernel (NTK). In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version. The equivalence is established under two cases. The first main result studies the infinite-width asymptotic. It is shown that given a pruning probability, for fully-connected neural networks with the weights randomly pruned at the initialization, as the width of each layer grows to infinity sequentially, the NTK of the pruned neural network converges to the limiting NTK of the original network with some extra scaling. If the network weights are rescaled appropriately after pruning, this extra scaling can be removed. The second main result considers the finite width case. It is shown that to ensure the NTK’s closeness to the limit, the dependence of width on the sparsity parameter is asymptotically linear, as the NTK’s gap to its limit goes down to zero. Moreover, if the pruning probability is set to zero (i.e., no pruning), the bound on the required width matches the bound for fully-connected neural networks in previous works up to logarithmic factors. The proof of this result requires developing novel analysis of a network structure which we called mask-induced pseudo-networks.Experiments are provided to evaluate our results.

Cite this Paper

BibTeX

@InProceedings{pmlr-v206-yang23b,
  title = 	 {On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks},
  author =       {Yang, Hongru and Wang, Zhangyang},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1513--1553},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/yang23b/yang23b.pdf},
  url = 	 {https://proceedings.mlr.press/v206/yang23b.html},
  abstract = 	 {Motivated by both theory and practice, we study how random pruning the weights affects a neural network’s neural tangent kernel (NTK). In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version. The equivalence is established under two cases. The first main result studies the infinite-width asymptotic. It is shown that given a pruning probability, for fully-connected neural networks with the weights randomly pruned at the initialization, as the width of each layer grows to infinity sequentially, the NTK of the pruned neural network converges to the limiting NTK of the original network with some extra scaling. If the network weights are rescaled appropriately after pruning, this extra scaling can be removed. The second main result considers the finite width case. It is shown that to ensure the NTK’s closeness to the limit, the dependence of width on the sparsity parameter is asymptotically linear, as the NTK’s gap to its limit goes down to zero. Moreover, if the pruning probability is set to zero (i.e., no pruning), the bound on the required width matches the bound for fully-connected neural networks in previous works up to logarithmic factors. The proof of this result requires developing novel analysis of a network structure which we called mask-induced pseudo-networks.Experiments are provided to evaluate our results.}
}

Endnote

%0 Conference Paper
%T On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks
%A Hongru Yang
%A Zhangyang Wang
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-yang23b
%I PMLR
%P 1513--1553
%U https://proceedings.mlr.press/v206/yang23b.html
%V 206
%X Motivated by both theory and practice, we study how random pruning the weights affects a neural network’s neural tangent kernel (NTK). In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version. The equivalence is established under two cases. The first main result studies the infinite-width asymptotic. It is shown that given a pruning probability, for fully-connected neural networks with the weights randomly pruned at the initialization, as the width of each layer grows to infinity sequentially, the NTK of the pruned neural network converges to the limiting NTK of the original network with some extra scaling. If the network weights are rescaled appropriately after pruning, this extra scaling can be removed. The second main result considers the finite width case. It is shown that to ensure the NTK’s closeness to the limit, the dependence of width on the sparsity parameter is asymptotically linear, as the NTK’s gap to its limit goes down to zero. Moreover, if the pruning probability is set to zero (i.e., no pruning), the bound on the required width matches the bound for fully-connected neural networks in previous works up to logarithmic factors. The proof of this result requires developing novel analysis of a network structure which we called mask-induced pseudo-networks.Experiments are provided to evaluate our results.

APA

Yang, H. & Wang, Z.. (2023). On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:1513-1553 Available from https://proceedings.mlr.press/v206/yang23b.html.

On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks

Abstract

Cite this Paper

Related Material