Recognizing Long Grammatical Sequences using Recurrent Networks Augmented with an External Differentiable Stack
Proceedings of the Fifteenth International Conference on Grammatical Inference, PMLR 153:130-153, 2021.
Recurrent neural networks (RNNs) are widely used for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs struggle to recognize very long sequences, which limits their applicability in many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), unable to reach 100% accuracy on the training set. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on complex CFLs nor tested on very long sequences that have been seen on training. In fact, earlier work has shown that continuous differentiable memory structures struggle in recognizing complex CFLs over very long sequences. In this paper, we improve the memory-augmented RNN with new architectural and state updating mechanisms that learn to properly balance the use of latent states with external memory. Our improved RNN models exhibit improved generalization and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CFGs, including the Dyck languages, as well as on the Penn Treebank language modelling dataset, achieving stable, robust performance on these benchmarks. Furthermore, we show that our proposed memory-augmented networks are able retain information over long sequences leading to improved generalization for strings up to length $160$.