Nested LSTMs

Joel Ruben Antony Moniz; David Krueger

Nested LSTMs

Joel Ruben Antony Moniz, David Krueger

Proceedings of the Ninth Asian Conference on Machine Learning, PMLR 77:530-544, 2017.

Abstract

We propose \emphNested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own \it inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as

$c^outer_t = f_t ⊙c_t-1 + i_t ⊙g_t$ , NLSTM memory cells use the concatenation

$(f_t ⊙c_t-1, i_t ⊙g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set

$c^outer_t$ =

$h^inner_t$ . Nested LSTMs outperform both stacked and single-layer LSTMs with similar numbers of parameters in our experiments on various character-level language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higher-level units of a stacked LSTM.

Cite this Paper

BibTeX


@InProceedings{pmlr-v77-moniz17a,
  title = 	 {Nested LSTMs},
  author = 	 {Moniz, Joel Ruben Antony and Krueger, David},
  booktitle = 	 {Proceedings of the Ninth Asian Conference on Machine Learning},
  pages = 	 {530--544},
  year = 	 {2017},
  editor = 	 {Zhang, Min-Ling and Noh, Yung-Kyun},
  volume = 	 {77},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Yonsei University, Seoul, Republic of Korea},
  month = 	 {15--17 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v77/moniz17a/moniz17a.pdf},
  url = 	 {https://proceedings.mlr.press/v77/moniz17a.html},
  abstract = 	 {We propose \emphNested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own \it inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as $c^outer_t = f_t ⊙c_t-1 + i_t ⊙g_t$, NLSTM memory cells use the concatenation $(f_t ⊙c_t-1, i_t ⊙g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set $c^outer_t$ = $h^inner_t$. Nested LSTMs outperform both stacked and single-layer LSTMs with similar numbers of parameters in our experiments on various character-level language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higher-level units of a stacked LSTM.}
}

Endnote

%0 Conference Paper
%T Nested LSTMs
%A Joel Ruben Antony Moniz
%A David Krueger
%B Proceedings of the Ninth Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Min-Ling Zhang
%E Yung-Kyun Noh	
%F pmlr-v77-moniz17a
%I PMLR
%P 530--544
%U https://proceedings.mlr.press/v77/moniz17a.html
%V 77
%X We propose \emphNested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own \it inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as $c^outer_t = f_t ⊙c_t-1 + i_t ⊙g_t$, NLSTM memory cells use the concatenation $(f_t ⊙c_t-1, i_t ⊙g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set $c^outer_t$ = $h^inner_t$. Nested LSTMs outperform both stacked and single-layer LSTMs with similar numbers of parameters in our experiments on various character-level language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higher-level units of a stacked LSTM.

APA


Moniz, J.R.A. & Krueger, D.. (2017). Nested LSTMs. Proceedings of the Ninth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 77:530-544 Available from https://proceedings.mlr.press/v77/moniz17a.html.

Related Material

Download PDF