Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

Berfin Simsek, François Ged, Arthur Jacot, Francesco Spadaro, Clement Hongler, Wulfram Gerstner, Johanni Brea
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9722-9732, 2021.

Abstract

We study how permutation symmetries in overparameterized multi-layer neural networks generate ‘symmetry-induced’ critical points. Assuming a network with L layers of minimal widths r1,,rL1 reaches a zero-loss minimum at r1!rL1! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r+h=:m we explicitly describe the manifold of global minima: it consists of T(r,m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r,m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width $r

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-simsek21a, title = {Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances}, author = {Simsek, Berfin and Ged, Fran{\c{c}}ois and Jacot, Arthur and Spadaro, Francesco and Hongler, Clement and Gerstner, Wulfram and Brea, Johanni}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {9722--9732}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/simsek21a/simsek21a.pdf}, url = {https://proceedings.mlr.press/v139/simsek21a.html}, abstract = {We study how permutation symmetries in overparameterized multi-layer neural networks generate ‘symmetry-induced’ critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1^*! \cdots r_{L-1}^*! $ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another. For a network of width $m$, we identify the number $G(r,m)$ of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width $r
Endnote
%0 Conference Paper %T Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances %A Berfin Simsek %A François Ged %A Arthur Jacot %A Francesco Spadaro %A Clement Hongler %A Wulfram Gerstner %A Johanni Brea %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-simsek21a %I PMLR %P 9722--9732 %U https://proceedings.mlr.press/v139/simsek21a.html %V 139 %X We study how permutation symmetries in overparameterized multi-layer neural networks generate ‘symmetry-induced’ critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1^*! \cdots r_{L-1}^*! $ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another. For a network of width $m$, we identify the number $G(r,m)$ of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width $r
APA
Simsek, B., Ged, F., Jacot, A., Spadaro, F., Hongler, C., Gerstner, W. & Brea, J.. (2021). Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9722-9732 Available from https://proceedings.mlr.press/v139/simsek21a.html.

Related Material