Implicit Bias of Linear RNNs

Melikasadat Emami, Mojtaba Sahraee-Ardakan, Parthe Pandit, Sundeep Rangan, Alyson K Fletcher
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2982-2992, 2021.

Abstract

Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, RNNs’ poor ability to capture long-term dependencies has not been fully understood. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that as the number of hidden units goes to infinity, linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution, and hence shorter memory. The degree of this bias depends on the variance of the transition matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated with both synthetic and real data experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-emami21b, title = {Implicit Bias of Linear RNNs}, author = {Emami, Melikasadat and Sahraee-Ardakan, Mojtaba and Pandit, Parthe and Rangan, Sundeep and Fletcher, Alyson K}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2982--2992}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/emami21b/emami21b.pdf}, url = {https://proceedings.mlr.press/v139/emami21b.html}, abstract = {Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, RNNs’ poor ability to capture long-term dependencies has not been fully understood. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that as the number of hidden units goes to infinity, linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution, and hence shorter memory. The degree of this bias depends on the variance of the transition matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated with both synthetic and real data experiments.} }
Endnote
%0 Conference Paper %T Implicit Bias of Linear RNNs %A Melikasadat Emami %A Mojtaba Sahraee-Ardakan %A Parthe Pandit %A Sundeep Rangan %A Alyson K Fletcher %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-emami21b %I PMLR %P 2982--2992 %U https://proceedings.mlr.press/v139/emami21b.html %V 139 %X Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, RNNs’ poor ability to capture long-term dependencies has not been fully understood. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that as the number of hidden units goes to infinity, linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution, and hence shorter memory. The degree of this bias depends on the variance of the transition matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated with both synthetic and real data experiments.
APA
Emami, M., Sahraee-Ardakan, M., Pandit, P., Rangan, S. & Fletcher, A.K.. (2021). Implicit Bias of Linear RNNs. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2982-2992 Available from https://proceedings.mlr.press/v139/emami21b.html.

Related Material