Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron

Jun-Kun Wang, Jacob Abernethy
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:17-32, 2021.

Abstract

Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance — namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-wang21a, title = {Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron}, author = {Wang, Jun-Kun and Abernethy, Jacob}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {17--32}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/wang21a/wang21a.pdf}, url = {https://proceedings.mlr.press/v157/wang21a.html}, abstract = {Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance — namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster.} }
Endnote
%0 Conference Paper %T Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron %A Jun-Kun Wang %A Jacob Abernethy %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-wang21a %I PMLR %P 17--32 %U https://proceedings.mlr.press/v157/wang21a.html %V 157 %X Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance — namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster.
APA
Wang, J. & Abernethy, J.. (2021). Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:17-32 Available from https://proceedings.mlr.press/v157/wang21a.html.

Related Material