Towards Understanding Learning in Neural Networks with Linear Teachers

Roei Sarussi, Alon Brutzkus, Amir Globerson
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9313-9322, 2021.

Abstract

Can a neural network minimizing cross-entropy learn linearly separable data? Despite progress in the theory of deep learning, this question remains unsolved. Here we prove that SGD globally optimizes this learning problem for a two-layer network with Leaky ReLU activations. The learned network can in principle be very complex. However, empirical evidence suggests that it often turns out to be approximately linear. We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary. Finally, we show a condition on the optimization that leads to weight clustering. We provide empirical results that validate our theoretical analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-sarussi21a, title = {Towards Understanding Learning in Neural Networks with Linear Teachers}, author = {Sarussi, Roei and Brutzkus, Alon and Globerson, Amir}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {9313--9322}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/sarussi21a/sarussi21a.pdf}, url = {https://proceedings.mlr.press/v139/sarussi21a.html}, abstract = {Can a neural network minimizing cross-entropy learn linearly separable data? Despite progress in the theory of deep learning, this question remains unsolved. Here we prove that SGD globally optimizes this learning problem for a two-layer network with Leaky ReLU activations. The learned network can in principle be very complex. However, empirical evidence suggests that it often turns out to be approximately linear. We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary. Finally, we show a condition on the optimization that leads to weight clustering. We provide empirical results that validate our theoretical analysis.} }
Endnote
%0 Conference Paper %T Towards Understanding Learning in Neural Networks with Linear Teachers %A Roei Sarussi %A Alon Brutzkus %A Amir Globerson %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-sarussi21a %I PMLR %P 9313--9322 %U https://proceedings.mlr.press/v139/sarussi21a.html %V 139 %X Can a neural network minimizing cross-entropy learn linearly separable data? Despite progress in the theory of deep learning, this question remains unsolved. Here we prove that SGD globally optimizes this learning problem for a two-layer network with Leaky ReLU activations. The learned network can in principle be very complex. However, empirical evidence suggests that it often turns out to be approximately linear. We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary. Finally, we show a condition on the optimization that leads to weight clustering. We provide empirical results that validate our theoretical analysis.
APA
Sarussi, R., Brutzkus, A. & Globerson, A.. (2021). Towards Understanding Learning in Neural Networks with Linear Teachers. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9313-9322 Available from https://proceedings.mlr.press/v139/sarussi21a.html.

Related Material