Recognizing Long Grammatical Sequences using Recurrent Networks Augmented with an External Differentiable Stack

Ankur Mali, Alexander Ororbia, Daniel Kifer, Lee Giles
Proceedings of the Fifteenth International Conference on Grammatical Inference, PMLR 153:130-153, 2021.

Abstract

Recurrent neural networks (RNNs) are widely used for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs struggle to recognize very long sequences, which limits their applicability in many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), unable to reach 100% accuracy on the training set. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on complex CFLs nor tested on very long sequences that have been seen on training. In fact, earlier work has shown that continuous differentiable memory structures struggle in recognizing complex CFLs over very long sequences. In this paper, we improve the memory-augmented RNN with new architectural and state updating mechanisms that learn to properly balance the use of latent states with external memory. Our improved RNN models exhibit improved generalization and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CFGs, including the Dyck languages, as well as on the Penn Treebank language modelling dataset, achieving stable, robust performance on these benchmarks. Furthermore, we show that our proposed memory-augmented networks are able retain information over long sequences leading to improved generalization for strings up to length $160$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v153-mali21a, title = {Recognizing Long Grammatical Sequences using Recurrent Networks Augmented with an External Differentiable Stack}, author = {Mali, Ankur and Ororbia, Alexander and Kifer, Daniel and Giles, Lee}, booktitle = {Proceedings of the Fifteenth International Conference on Grammatical Inference}, pages = {130--153}, year = {2021}, editor = {Chandlee, Jane and Eyraud, Rémi and Heinz, Jeff and Jardine, Adam and van Zaanen, Menno}, volume = {153}, series = {Proceedings of Machine Learning Research}, month = {23--27 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v153/mali21a/mali21a.pdf}, url = {https://proceedings.mlr.press/v153/mali21a.html}, abstract = {Recurrent neural networks (RNNs) are widely used for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs struggle to recognize very long sequences, which limits their applicability in many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), unable to reach 100% accuracy on the training set. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on complex CFLs nor tested on very long sequences that have been seen on training. In fact, earlier work has shown that continuous differentiable memory structures struggle in recognizing complex CFLs over very long sequences. In this paper, we improve the memory-augmented RNN with new architectural and state updating mechanisms that learn to properly balance the use of latent states with external memory. Our improved RNN models exhibit improved generalization and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CFGs, including the Dyck languages, as well as on the Penn Treebank language modelling dataset, achieving stable, robust performance on these benchmarks. Furthermore, we show that our proposed memory-augmented networks are able retain information over long sequences leading to improved generalization for strings up to length $160$.} }
Endnote
%0 Conference Paper %T Recognizing Long Grammatical Sequences using Recurrent Networks Augmented with an External Differentiable Stack %A Ankur Mali %A Alexander Ororbia %A Daniel Kifer %A Lee Giles %B Proceedings of the Fifteenth International Conference on Grammatical Inference %C Proceedings of Machine Learning Research %D 2021 %E Jane Chandlee %E Rémi Eyraud %E Jeff Heinz %E Adam Jardine %E Menno van Zaanen %F pmlr-v153-mali21a %I PMLR %P 130--153 %U https://proceedings.mlr.press/v153/mali21a.html %V 153 %X Recurrent neural networks (RNNs) are widely used for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs struggle to recognize very long sequences, which limits their applicability in many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), unable to reach 100% accuracy on the training set. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on complex CFLs nor tested on very long sequences that have been seen on training. In fact, earlier work has shown that continuous differentiable memory structures struggle in recognizing complex CFLs over very long sequences. In this paper, we improve the memory-augmented RNN with new architectural and state updating mechanisms that learn to properly balance the use of latent states with external memory. Our improved RNN models exhibit improved generalization and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CFGs, including the Dyck languages, as well as on the Penn Treebank language modelling dataset, achieving stable, robust performance on these benchmarks. Furthermore, we show that our proposed memory-augmented networks are able retain information over long sequences leading to improved generalization for strings up to length $160$.
APA
Mali, A., Ororbia, A., Kifer, D. & Giles, L.. (2021). Recognizing Long Grammatical Sequences using Recurrent Networks Augmented with an External Differentiable Stack. Proceedings of the Fifteenth International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 153:130-153 Available from https://proceedings.mlr.press/v153/mali21a.html.

Related Material