Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Kyle Helfrich, Devin Willmott, Qiang Ye
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1969-1978, 2018.

Abstract

Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform; such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-helfrich18a, title = {Orthogonal Recurrent Neural Networks with Scaled {C}ayley Transform}, author = {Helfrich, Kyle and Willmott, Devin and Ye, Qiang}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1969--1978}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/helfrich18a/helfrich18a.pdf}, url = {http://proceedings.mlr.press/v80/helfrich18a.html}, abstract = {Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform; such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.} }
Endnote
%0 Conference Paper %T Orthogonal Recurrent Neural Networks with Scaled Cayley Transform %A Kyle Helfrich %A Devin Willmott %A Qiang Ye %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-helfrich18a %I PMLR %P 1969--1978 %U http://proceedings.mlr.press/v80/helfrich18a.html %V 80 %X Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform; such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.
APA
Helfrich, K., Willmott, D. & Ye, Q.. (2018). Orthogonal Recurrent Neural Networks with Scaled Cayley Transform. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1969-1978 Available from http://proceedings.mlr.press/v80/helfrich18a.html.

Related Material