An Empirical Exploration of Recurrent Network Architectures

Rafal Jozefowicz; Wojciech Zaremba; Ilya Sutskever

An Empirical Exploration of Recurrent Network Architectures

Rafal Jozefowicz, Wojciech Zaremba, Ilya Sutskever

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2342-2350, 2015.

Abstract

The Recurrent Neural Network (RNN) is an extremely powerful sequence model that is often difficult to train. The Long Short-Term Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s architecture appears to be ad-hoc so it is not clear if it is optimal, and the significance of its individual components is unclear. In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks. We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-jozefowicz15,
  title = 	 {An Empirical Exploration of Recurrent Network Architectures},
  author = 	 {Jozefowicz, Rafal and Zaremba, Wojciech and Sutskever, Ilya},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {2342--2350},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/jozefowicz15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/jozefowicz15.html},
  abstract = 	 {The Recurrent Neural Network (RNN) is an extremely powerful sequence model that is often difficult to train. The Long Short-Term Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s architecture appears to be ad-hoc so it is not clear if it is optimal, and the significance of its individual components is unclear. In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks. We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU.}
}

Endnote

%0 Conference Paper
%T An Empirical Exploration of Recurrent Network Architectures
%A Rafal Jozefowicz
%A Wojciech Zaremba
%A Ilya Sutskever
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-jozefowicz15
%I PMLR
%P 2342--2350
%U https://proceedings.mlr.press/v37/jozefowicz15.html
%V 37
%X The Recurrent Neural Network (RNN) is an extremely powerful sequence model that is often difficult to train. The Long Short-Term Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s architecture appears to be ad-hoc so it is not clear if it is optimal, and the significance of its individual components is unclear. In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks. We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU.

RIS


TY  - CPAPER
TI  - An Empirical Exploration of Recurrent Network Architectures
AU  - Rafal Jozefowicz
AU  - Wojciech Zaremba
AU  - Ilya Sutskever
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-jozefowicz15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 2342
EP  - 2350
L1  - http://proceedings.mlr.press/v37/jozefowicz15.pdf
UR  - https://proceedings.mlr.press/v37/jozefowicz15.html
AB  - The Recurrent Neural Network (RNN) is an extremely powerful sequence model that is often difficult to train. The Long Short-Term Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s architecture appears to be ad-hoc so it is not clear if it is optimal, and the significance of its individual components is unclear. In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks. We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU.
ER  -

APA


Jozefowicz, R., Zaremba, W. & Sutskever, I.. (2015). An Empirical Exploration of Recurrent Network Architectures. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2342-2350 Available from https://proceedings.mlr.press/v37/jozefowicz15.html.

Related Material

Download PDF