The Statistical Recurrent Unit

Junier B. Oliva, Barnabás Póczos, Jeff Schneider
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2671-2680, 2017.

Abstract

Sophisticated gated recurrent neural network architectures like LSTMs and GRUs have been shown to be highly effective in a myriad of applications. We develop an un-gated unit, the statistical recurrent unit (SRU), that is able to learn long term dependencies in data by only keeping moving averages of statistics. The SRU’s architecture is simple, un-gated, and contains a comparable number of parameters to LSTMs; yet, SRUs perform favorably to more sophisticated LSTM and GRU alternatives, often outperforming one or both in various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an unbiased manner by optimizing respective architectures’ hyperparameters for both synthetic and real-world tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-oliva17a, title = {The Statistical Recurrent Unit}, author = {Junier B. Oliva and Barnab{\'a}s P{\'o}czos and Jeff Schneider}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {2671--2680}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/oliva17a/oliva17a.pdf}, url = { http://proceedings.mlr.press/v70/oliva17a.html }, abstract = {Sophisticated gated recurrent neural network architectures like LSTMs and GRUs have been shown to be highly effective in a myriad of applications. We develop an un-gated unit, the statistical recurrent unit (SRU), that is able to learn long term dependencies in data by only keeping moving averages of statistics. The SRU’s architecture is simple, un-gated, and contains a comparable number of parameters to LSTMs; yet, SRUs perform favorably to more sophisticated LSTM and GRU alternatives, often outperforming one or both in various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an unbiased manner by optimizing respective architectures’ hyperparameters for both synthetic and real-world tasks.} }
Endnote
%0 Conference Paper %T The Statistical Recurrent Unit %A Junier B. Oliva %A Barnabás Póczos %A Jeff Schneider %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-oliva17a %I PMLR %P 2671--2680 %U http://proceedings.mlr.press/v70/oliva17a.html %V 70 %X Sophisticated gated recurrent neural network architectures like LSTMs and GRUs have been shown to be highly effective in a myriad of applications. We develop an un-gated unit, the statistical recurrent unit (SRU), that is able to learn long term dependencies in data by only keeping moving averages of statistics. The SRU’s architecture is simple, un-gated, and contains a comparable number of parameters to LSTMs; yet, SRUs perform favorably to more sophisticated LSTM and GRU alternatives, often outperforming one or both in various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an unbiased manner by optimizing respective architectures’ hyperparameters for both synthetic and real-world tasks.
APA
Oliva, J.B., Póczos, B. & Schneider, J.. (2017). The Statistical Recurrent Unit. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2671-2680 Available from http://proceedings.mlr.press/v70/oliva17a.html .

Related Material