Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron

Jun-Kun Wang; Jacob Abernethy

Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron

Jun-Kun Wang, Jacob Abernethy

Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:17-32, 2021.

Abstract

Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance — namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster.

Cite this Paper

BibTeX


@InProceedings{pmlr-v157-wang21a,
  title = 	 {Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron},
  author =       {Wang, Jun-Kun and Abernethy, Jacob},
  booktitle = 	 {Proceedings of The 13th Asian Conference on Machine Learning},
  pages = 	 {17--32},
  year = 	 {2021},
  editor = 	 {Balasubramanian, Vineeth N. and Tsang, Ivor},
  volume = 	 {157},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v157/wang21a/wang21a.pdf},
  url = 	 {https://proceedings.mlr.press/v157/wang21a.html},
  abstract = 	 {Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance — namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster.}
}

Endnote

%0 Conference Paper
%T Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron
%A Jun-Kun Wang
%A Jacob Abernethy
%B Proceedings of The 13th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Vineeth N. Balasubramanian
%E Ivor Tsang	
%F pmlr-v157-wang21a
%I PMLR
%P 17--32
%U https://proceedings.mlr.press/v157/wang21a.html
%V 157
%X Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance — namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster.

APA


Wang, J. & Abernethy, J.. (2021). Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:17-32 Available from https://proceedings.mlr.press/v157/wang21a.html.

Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron

Abstract

Cite this Paper

Related Material