Factorized Recurrent Neural Architectures for Longer Range Dependence

Francois Belletti, Alex Beutel, Sagar Jain, Ed Chi
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1522-1530, 2018.

Abstract

The ability to capture Long Range Dependence (LRD) in a stochastic process is of prime importance in the context of predictive models. A sequential model with a longer-term memory is better able contextualize recent observations. In this article, we apply the theory of LRD stochastic processes to modern recurrent architectures, such as LSTMs and GRUs, and prove they do not provide LRD under assumptions sufficient for gradients to vanish. Motivated by an information-theoretic analysis, we provide a modified recurrent neural architecture that mitigates the issue of faulty memory through redundancy while keeping the compute time constant. Experimental results on a synthetic copy task, the Youtube-8m video classification task and a recommender system show that we enable better memorization and longer-term memory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-belletti18a, title = {Factorized Recurrent Neural Architectures for Longer Range Dependence}, author = {Francois Belletti and Alex Beutel and Sagar Jain and Ed Chi}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1522--1530}, year = {2018}, editor = {Amos Storkey and Fernando Perez-Cruz}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/belletti18a/belletti18a.pdf}, url = { http://proceedings.mlr.press/v84/belletti18a.html }, abstract = {The ability to capture Long Range Dependence (LRD) in a stochastic process is of prime importance in the context of predictive models. A sequential model with a longer-term memory is better able contextualize recent observations. In this article, we apply the theory of LRD stochastic processes to modern recurrent architectures, such as LSTMs and GRUs, and prove they do not provide LRD under assumptions sufficient for gradients to vanish. Motivated by an information-theoretic analysis, we provide a modified recurrent neural architecture that mitigates the issue of faulty memory through redundancy while keeping the compute time constant. Experimental results on a synthetic copy task, the Youtube-8m video classification task and a recommender system show that we enable better memorization and longer-term memory.} }
Endnote
%0 Conference Paper %T Factorized Recurrent Neural Architectures for Longer Range Dependence %A Francois Belletti %A Alex Beutel %A Sagar Jain %A Ed Chi %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-belletti18a %I PMLR %P 1522--1530 %U http://proceedings.mlr.press/v84/belletti18a.html %V 84 %X The ability to capture Long Range Dependence (LRD) in a stochastic process is of prime importance in the context of predictive models. A sequential model with a longer-term memory is better able contextualize recent observations. In this article, we apply the theory of LRD stochastic processes to modern recurrent architectures, such as LSTMs and GRUs, and prove they do not provide LRD under assumptions sufficient for gradients to vanish. Motivated by an information-theoretic analysis, we provide a modified recurrent neural architecture that mitigates the issue of faulty memory through redundancy while keeping the compute time constant. Experimental results on a synthetic copy task, the Youtube-8m video classification task and a recommender system show that we enable better memorization and longer-term memory.
APA
Belletti, F., Beutel, A., Jain, S. & Chi, E.. (2018). Factorized Recurrent Neural Architectures for Longer Range Dependence. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1522-1530 Available from http://proceedings.mlr.press/v84/belletti18a.html .

Related Material