Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks

Wei Huang; Wuyang Chen; zhiqiang xu; Zhangyang Wang; Taiji Suzuki

Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks

Wei Huang, Wuyang Chen, zhiqiang xu, Zhangyang Wang, Taiji Suzuki

Conference on Parsimony and Learning, PMLR 280:1087-1111, 2025.

Abstract

Deep neural networks exhibit rich training dynamics under gradient descent updates. The root of this phenomenon is the non-convex optimization of deep neural networks, which is extensively studied in recent theory works. However, previous works did not consider or only considered a few gradient descent steps under non-asymptotic manner, resulting in an incomplete characterization of the network’s stage-wise learning behavior and the evolutionary trajectory of its parameters and outputs. In this work, we characterize how a network’s feature learning happens during training in a regression setting. We analyze the dynamics of two quantities of a two-layer linear network: the projection of the first layer’s weights onto the feature vector, and the weights in the second layer. The former indicates how well the network fits the feature vector from the input data, and the latter stands for the magnitude learned by the network. More importantly, by formulating the dynamics of these two quantities into a non-linear system, we give the precise characterization of the training trajectory, demonstrating the rich feature learning dynamics in the linear neural network. Moreover, we establish a connection between the feature learning dynamics and the neural tangent kernel, illustrating the presence of feature learning beyond lazy training. Experimental simulations corroborate our theoretical findings, confirming the validity of our proposed conclusion.

Cite this Paper

BibTeX

@InProceedings{pmlr-v280-huang25a,
  title = 	 {Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks},
  author =       {Huang, Wei and Chen, Wuyang and xu, zhiqiang and Wang, Zhangyang and Suzuki, Taiji},
  booktitle = 	 {Conference on Parsimony and Learning},
  pages = 	 {1087--1111},
  year = 	 {2025},
  editor = 	 {Chen, Beidi and Liu, Shijia and Pilanci, Mert and Su, Weijie and Sulam, Jeremias and Wang, Yuxiang and Zhu, Zhihui},
  volume = 	 {280},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {24--27 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v280/main/assets/huang25a/huang25a.pdf},
  url = 	 {https://proceedings.mlr.press/v280/huang25a.html},
  abstract = 	 {Deep neural networks exhibit rich training dynamics under gradient descent updates. The root of this phenomenon is the non-convex optimization of deep neural networks, which is extensively studied in recent theory works. However, previous works did not consider or only considered a few gradient descent steps under non-asymptotic manner, resulting in an incomplete characterization of the network’s stage-wise learning behavior and the evolutionary trajectory of its parameters and outputs. In this work, we characterize how a network’s feature learning happens during training in a regression setting. We analyze the dynamics of two quantities of a two-layer linear network: the projection of the first layer’s weights onto the feature vector, and the weights in the second layer. The former indicates how well the network fits the feature vector from the input data, and the latter stands for the magnitude learned by the network. More importantly, by formulating the dynamics of these two quantities into a non-linear system, we give the precise characterization of the training trajectory, demonstrating the rich feature learning dynamics in the linear neural network. Moreover, we establish a connection between the feature learning dynamics and the neural tangent kernel, illustrating the presence of feature learning beyond lazy training. Experimental simulations corroborate our theoretical findings, confirming the validity of our proposed conclusion.}
}

Endnote

%0 Conference Paper
%T Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks
%A Wei Huang
%A Wuyang Chen
%A zhiqiang xu
%A Zhangyang Wang
%A Taiji Suzuki
%B Conference on Parsimony and Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Beidi Chen
%E Shijia Liu
%E Mert Pilanci
%E Weijie Su
%E Jeremias Sulam
%E Yuxiang Wang
%E Zhihui Zhu	
%F pmlr-v280-huang25a
%I PMLR
%P 1087--1111
%U https://proceedings.mlr.press/v280/huang25a.html
%V 280
%X Deep neural networks exhibit rich training dynamics under gradient descent updates. The root of this phenomenon is the non-convex optimization of deep neural networks, which is extensively studied in recent theory works. However, previous works did not consider or only considered a few gradient descent steps under non-asymptotic manner, resulting in an incomplete characterization of the network’s stage-wise learning behavior and the evolutionary trajectory of its parameters and outputs. In this work, we characterize how a network’s feature learning happens during training in a regression setting. We analyze the dynamics of two quantities of a two-layer linear network: the projection of the first layer’s weights onto the feature vector, and the weights in the second layer. The former indicates how well the network fits the feature vector from the input data, and the latter stands for the magnitude learned by the network. More importantly, by formulating the dynamics of these two quantities into a non-linear system, we give the precise characterization of the training trajectory, demonstrating the rich feature learning dynamics in the linear neural network. Moreover, we establish a connection between the feature learning dynamics and the neural tangent kernel, illustrating the presence of feature learning beyond lazy training. Experimental simulations corroborate our theoretical findings, confirming the validity of our proposed conclusion.

APA

Huang, W., Chen, W., xu, z., Wang, Z. & Suzuki, T.. (2025). Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 280:1087-1111 Available from https://proceedings.mlr.press/v280/huang25a.html.

Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks

Abstract

Cite this Paper

Related Material