Deep equilibrium networks are sensitive to initialization statistics

Atish Agarwala, Samuel S Schoenholz
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:136-160, 2022.

Abstract

Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-agarwala22a, title = {Deep equilibrium networks are sensitive to initialization statistics}, author = {Agarwala, Atish and Schoenholz, Samuel S}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {136--160}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/agarwala22a/agarwala22a.pdf}, url = {https://proceedings.mlr.press/v162/agarwala22a.html}, abstract = {Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales.} }
Endnote
%0 Conference Paper %T Deep equilibrium networks are sensitive to initialization statistics %A Atish Agarwala %A Samuel S Schoenholz %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-agarwala22a %I PMLR %P 136--160 %U https://proceedings.mlr.press/v162/agarwala22a.html %V 162 %X Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales.
APA
Agarwala, A. & Schoenholz, S.S.. (2022). Deep equilibrium networks are sensitive to initialization statistics. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:136-160 Available from https://proceedings.mlr.press/v162/agarwala22a.html.

Related Material