UnICORNN: A recurrent model for learning very long time dependencies

T. Konstantin Rusch; Siddhartha Mishra

UnICORNN: A recurrent model for learning very long time dependencies

T. Konstantin Rusch, Siddhartha Mishra

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9168-9178, 2021.

Abstract

The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long-time dependencies.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-rusch21a,
  title = 	 {UnICORNN: A recurrent model for learning very long time dependencies},
  author =       {Rusch, T. Konstantin and Mishra, Siddhartha},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {9168--9178},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/rusch21a/rusch21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/rusch21a.html},
  abstract = 	 {The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long-time dependencies.}
}

Endnote

%0 Conference Paper
%T UnICORNN: A recurrent model for learning very long time dependencies
%A T. Konstantin Rusch
%A Siddhartha Mishra
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-rusch21a
%I PMLR
%P 9168--9178
%U https://proceedings.mlr.press/v139/rusch21a.html
%V 139
%X The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long-time dependencies.

APA

Rusch, T.K. & Mishra, S.. (2021). UnICORNN: A recurrent model for learning very long time dependencies. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9168-9178 Available from https://proceedings.mlr.press/v139/rusch21a.html.

UnICORNN: A recurrent model for learning very long time dependencies

Abstract

Cite this Paper

Related Material