Hidden Symmetries of ReLU Networks

Elisenda Grigsby, Kathryn Lindsey, David Rolnick
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:11734-11760, 2023.

Abstract

The parameter space for any fixed architecture of feedforward ReLU neural networks serves as a proxy during training for the associated class of functions - but how faithful is this representation? It is known that many different parameter settings $\theta$ can determine the same function $f$. Moreover, the degree of this redundancy is inhomogeneous: for some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. We also describe a number of mechanisms through which hidden symmetries can arise, and empirically approximate the functional dimension of different network architectures at initialization. These experiments indicate that the probability that a network has no hidden symmetries decreases towards 0 as depth increases, while increasing towards 1 as width and input dimension increase.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-grigsby23a, title = {Hidden Symmetries of {R}e{LU} Networks}, author = {Grigsby, Elisenda and Lindsey, Kathryn and Rolnick, David}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {11734--11760}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/grigsby23a/grigsby23a.pdf}, url = {https://proceedings.mlr.press/v202/grigsby23a.html}, abstract = {The parameter space for any fixed architecture of feedforward ReLU neural networks serves as a proxy during training for the associated class of functions - but how faithful is this representation? It is known that many different parameter settings $\theta$ can determine the same function $f$. Moreover, the degree of this redundancy is inhomogeneous: for some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. We also describe a number of mechanisms through which hidden symmetries can arise, and empirically approximate the functional dimension of different network architectures at initialization. These experiments indicate that the probability that a network has no hidden symmetries decreases towards 0 as depth increases, while increasing towards 1 as width and input dimension increase.} }
Endnote
%0 Conference Paper %T Hidden Symmetries of ReLU Networks %A Elisenda Grigsby %A Kathryn Lindsey %A David Rolnick %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-grigsby23a %I PMLR %P 11734--11760 %U https://proceedings.mlr.press/v202/grigsby23a.html %V 202 %X The parameter space for any fixed architecture of feedforward ReLU neural networks serves as a proxy during training for the associated class of functions - but how faithful is this representation? It is known that many different parameter settings $\theta$ can determine the same function $f$. Moreover, the degree of this redundancy is inhomogeneous: for some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. We also describe a number of mechanisms through which hidden symmetries can arise, and empirically approximate the functional dimension of different network architectures at initialization. These experiments indicate that the probability that a network has no hidden symmetries decreases towards 0 as depth increases, while increasing towards 1 as width and input dimension increase.
APA
Grigsby, E., Lindsey, K. & Rolnick, D.. (2023). Hidden Symmetries of ReLU Networks. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:11734-11760 Available from https://proceedings.mlr.press/v202/grigsby23a.html.

Related Material