Recurrent Orthogonal Networks and Long-Memory Tasks

Mikael Henaff, Arthur Szlam, Yann LeCun
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2034-2042, 2016.

Abstract

Although RNNs have been shown to be power- ful tools for processing sequential data, finding architectures or optimization strategies that al- low them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets orig- inally outlined in (Hochreiter & Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illumi- nate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions fur- thermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-henaff16, title = {Recurrent Orthogonal Networks and Long-Memory Tasks}, author = {Henaff, Mikael and Szlam, Arthur and LeCun, Yann}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {2034--2042}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/henaff16.pdf}, url = {https://proceedings.mlr.press/v48/henaff16.html}, abstract = {Although RNNs have been shown to be power- ful tools for processing sequential data, finding architectures or optimization strategies that al- low them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets orig- inally outlined in (Hochreiter & Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illumi- nate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions fur- thermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.} }
Endnote
%0 Conference Paper %T Recurrent Orthogonal Networks and Long-Memory Tasks %A Mikael Henaff %A Arthur Szlam %A Yann LeCun %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-henaff16 %I PMLR %P 2034--2042 %U https://proceedings.mlr.press/v48/henaff16.html %V 48 %X Although RNNs have been shown to be power- ful tools for processing sequential data, finding architectures or optimization strategies that al- low them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets orig- inally outlined in (Hochreiter & Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illumi- nate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions fur- thermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.
RIS
TY - CPAPER TI - Recurrent Orthogonal Networks and Long-Memory Tasks AU - Mikael Henaff AU - Arthur Szlam AU - Yann LeCun BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-henaff16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 2034 EP - 2042 L1 - http://proceedings.mlr.press/v48/henaff16.pdf UR - https://proceedings.mlr.press/v48/henaff16.html AB - Although RNNs have been shown to be power- ful tools for processing sequential data, finding architectures or optimization strategies that al- low them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets orig- inally outlined in (Hochreiter & Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illumi- nate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions fur- thermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices. ER -
APA
Henaff, M., Szlam, A. & LeCun, Y.. (2016). Recurrent Orthogonal Networks and Long-Memory Tasks. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2034-2042 Available from https://proceedings.mlr.press/v48/henaff16.html.

Related Material