Unitary Evolution Recurrent Neural Networks

Martin Arjovsky; Amar Shah; Yoshua Bengio

Unitary Evolution Recurrent Neural Networks

Martin Arjovsky, Amar Shah, Yoshua Bengio

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1120-1128, 2016.

Abstract

Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-arjovsky16,
  title = 	 {Unitary Evolution Recurrent Neural Networks},
  author = 	 {Arjovsky, Martin and Shah, Amar and Bengio, Yoshua},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1120--1128},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/arjovsky16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/arjovsky16.html},
  abstract = 	 {Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.}
}

Endnote

%0 Conference Paper
%T Unitary Evolution Recurrent Neural Networks
%A Martin Arjovsky
%A Amar Shah
%A Yoshua Bengio
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-arjovsky16
%I PMLR
%P 1120--1128
%U https://proceedings.mlr.press/v48/arjovsky16.html
%V 48
%X Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

RIS


TY  - CPAPER
TI  - Unitary Evolution Recurrent Neural Networks
AU  - Martin Arjovsky
AU  - Amar Shah
AU  - Yoshua Bengio
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-arjovsky16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 1120
EP  - 1128
L1  - http://proceedings.mlr.press/v48/arjovsky16.pdf
UR  - https://proceedings.mlr.press/v48/arjovsky16.html
AB  - Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.
ER  -

APA


Arjovsky, M., Shah, A. & Bengio, Y.. (2016). Unitary Evolution Recurrent Neural Networks. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1120-1128 Available from https://proceedings.mlr.press/v48/arjovsky16.html.

Related Material

Download PDF