Recurrent Orthogonal Networks and Long-Memory Tasks


Mikael Henaff, Arthur Szlam, Yann LeCun ;
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2034-2042, 2016.


Although RNNs have been shown to be power- ful tools for processing sequential data, finding architectures or optimization strategies that al- low them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets orig- inally outlined in (Hochreiter & Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illumi- nate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions fur- thermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.

Related Material