Investigating Backpropagation Alternatives when Learning to Dynamically Count with Recurrent Neural Networks

Ankur Mali, Alexander Ororbia, Daniel Kifer, Lee Giles
Proceedings of the Fifteenth International Conference on Grammatical Inference, PMLR 153:154-175, 2021.

Abstract

Artificial neural networks, such as recurrent neural networks (RNNs) and transformers, have empirically demonstrated impressive performance across many natural language processing tasks. However, automated text processing at a deeper and more interpretable level arguably requires extracting intricate patterns such as underlying grammatical structures. As a result, correctly interpreting a neural language model would require an understanding of linguistic structure through formal language theory. Nonetheless, there is often a discrepancy between theoretical and practical findings that restrict models informed by formal language theory in real-life scenarios. For instance, while learning context-free grammars (CFGs), existing neural models fall short because they lack appropriate memory structures. In this work, we investigate how learning algorithms affect the generalization ability of RNNs that are designed to learn context-free languages (CFGs) as well as their ability to encode hierarchical representations. To do so, we investigate a range of learning algorithms on complex, context-free languages such as the Dyck languages, with a focus on the RNN’s ability to generalize to longer sequences when processing a CFG. Our results demonstrate that a Long Short-term memory (LSTM) RNN equipped with second-order connections, trained with the sparse attentive backtracking (SAB) algorithm, performs stably across various Dyck languages and successfully emulates real-time-counter machines. We empirically show that RNNs without external memory are incapable of recognizing Dyck-2 languages, which require a stack-like structure. We finally investigate each learning algorithm’s performance on real-world language modeling tasks using the Penn Tree Bank and text8 benchmarks. We further investigate how an increase in model parameters affects each RNN’s stability and grammar recognition performance when trained using different learning algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v153-mali21b, title = {Investigating Backpropagation Alternatives when Learning to Dynamically Count with Recurrent Neural Networks}, author = {Mali, Ankur and Ororbia, Alexander and Kifer, Daniel and Giles, Lee}, booktitle = {Proceedings of the Fifteenth International Conference on Grammatical Inference}, pages = {154--175}, year = {2021}, editor = {Chandlee, Jane and Eyraud, Rémi and Heinz, Jeff and Jardine, Adam and van Zaanen, Menno}, volume = {153}, series = {Proceedings of Machine Learning Research}, month = {23--27 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v153/mali21b/mali21b.pdf}, url = {https://proceedings.mlr.press/v153/mali21b.html}, abstract = {Artificial neural networks, such as recurrent neural networks (RNNs) and transformers, have empirically demonstrated impressive performance across many natural language processing tasks. However, automated text processing at a deeper and more interpretable level arguably requires extracting intricate patterns such as underlying grammatical structures. As a result, correctly interpreting a neural language model would require an understanding of linguistic structure through formal language theory. Nonetheless, there is often a discrepancy between theoretical and practical findings that restrict models informed by formal language theory in real-life scenarios. For instance, while learning context-free grammars (CFGs), existing neural models fall short because they lack appropriate memory structures. In this work, we investigate how learning algorithms affect the generalization ability of RNNs that are designed to learn context-free languages (CFGs) as well as their ability to encode hierarchical representations. To do so, we investigate a range of learning algorithms on complex, context-free languages such as the Dyck languages, with a focus on the RNN’s ability to generalize to longer sequences when processing a CFG. Our results demonstrate that a Long Short-term memory (LSTM) RNN equipped with second-order connections, trained with the sparse attentive backtracking (SAB) algorithm, performs stably across various Dyck languages and successfully emulates real-time-counter machines. We empirically show that RNNs without external memory are incapable of recognizing Dyck-2 languages, which require a stack-like structure. We finally investigate each learning algorithm’s performance on real-world language modeling tasks using the Penn Tree Bank and text8 benchmarks. We further investigate how an increase in model parameters affects each RNN’s stability and grammar recognition performance when trained using different learning algorithms.} }
Endnote
%0 Conference Paper %T Investigating Backpropagation Alternatives when Learning to Dynamically Count with Recurrent Neural Networks %A Ankur Mali %A Alexander Ororbia %A Daniel Kifer %A Lee Giles %B Proceedings of the Fifteenth International Conference on Grammatical Inference %C Proceedings of Machine Learning Research %D 2021 %E Jane Chandlee %E Rémi Eyraud %E Jeff Heinz %E Adam Jardine %E Menno van Zaanen %F pmlr-v153-mali21b %I PMLR %P 154--175 %U https://proceedings.mlr.press/v153/mali21b.html %V 153 %X Artificial neural networks, such as recurrent neural networks (RNNs) and transformers, have empirically demonstrated impressive performance across many natural language processing tasks. However, automated text processing at a deeper and more interpretable level arguably requires extracting intricate patterns such as underlying grammatical structures. As a result, correctly interpreting a neural language model would require an understanding of linguistic structure through formal language theory. Nonetheless, there is often a discrepancy between theoretical and practical findings that restrict models informed by formal language theory in real-life scenarios. For instance, while learning context-free grammars (CFGs), existing neural models fall short because they lack appropriate memory structures. In this work, we investigate how learning algorithms affect the generalization ability of RNNs that are designed to learn context-free languages (CFGs) as well as their ability to encode hierarchical representations. To do so, we investigate a range of learning algorithms on complex, context-free languages such as the Dyck languages, with a focus on the RNN’s ability to generalize to longer sequences when processing a CFG. Our results demonstrate that a Long Short-term memory (LSTM) RNN equipped with second-order connections, trained with the sparse attentive backtracking (SAB) algorithm, performs stably across various Dyck languages and successfully emulates real-time-counter machines. We empirically show that RNNs without external memory are incapable of recognizing Dyck-2 languages, which require a stack-like structure. We finally investigate each learning algorithm’s performance on real-world language modeling tasks using the Penn Tree Bank and text8 benchmarks. We further investigate how an increase in model parameters affects each RNN’s stability and grammar recognition performance when trained using different learning algorithms.
APA
Mali, A., Ororbia, A., Kifer, D. & Giles, L.. (2021). Investigating Backpropagation Alternatives when Learning to Dynamically Count with Recurrent Neural Networks. Proceedings of the Fifteenth International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 153:154-175 Available from https://proceedings.mlr.press/v153/mali21b.html.

Related Material