Learning Transductions and Alignments with RNN Seq2seq Models

Zhengxiang Wang
Proceedings of 16th edition of the International Conference on Grammatical Inference, PMLR 217:223-249, 2023.

Abstract

The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are traditionally well studied under finite state transducers and attributed with increasing complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions. Although attention makes learning more efficient and robust, it does not overcome the out-of-distribution generalization limitation. We establish a novel complexity hierarchy for learning the four tasks for attention-less RNN seq2seq models, which may be understood in terms of the complexity hierarchy of formal languages, instead of string transductions. RNN variants also play a role in the results. In particular, we show that Simple RNN seq2seq models cannot count the input length. \\

Cite this Paper


BibTeX
@InProceedings{pmlr-v217-wang23a, title = {Learning Transductions and Alignments with RNN Seq2seq Models}, author = {Wang, Zhengxiang}, booktitle = {Proceedings of 16th edition of the International Conference on Grammatical Inference}, pages = {223--249}, year = {2023}, editor = {Coste, François and Ouardi, Faissal and Rabusseau, Guillaume}, volume = {217}, series = {Proceedings of Machine Learning Research}, month = {10--13 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v217/wang23a/wang23a.pdf}, url = {https://proceedings.mlr.press/v217/wang23a.html}, abstract = {The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are traditionally well studied under finite state transducers and attributed with increasing complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions. Although attention makes learning more efficient and robust, it does not overcome the out-of-distribution generalization limitation. We establish a novel complexity hierarchy for learning the four tasks for attention-less RNN seq2seq models, which may be understood in terms of the complexity hierarchy of formal languages, instead of string transductions. RNN variants also play a role in the results. In particular, we show that Simple RNN seq2seq models cannot count the input length. \\} }
Endnote
%0 Conference Paper %T Learning Transductions and Alignments with RNN Seq2seq Models %A Zhengxiang Wang %B Proceedings of 16th edition of the International Conference on Grammatical Inference %C Proceedings of Machine Learning Research %D 2023 %E François Coste %E Faissal Ouardi %E Guillaume Rabusseau %F pmlr-v217-wang23a %I PMLR %P 223--249 %U https://proceedings.mlr.press/v217/wang23a.html %V 217 %X The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are traditionally well studied under finite state transducers and attributed with increasing complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions. Although attention makes learning more efficient and robust, it does not overcome the out-of-distribution generalization limitation. We establish a novel complexity hierarchy for learning the four tasks for attention-less RNN seq2seq models, which may be understood in terms of the complexity hierarchy of formal languages, instead of string transductions. RNN variants also play a role in the results. In particular, we show that Simple RNN seq2seq models cannot count the input length. \\
APA
Wang, Z.. (2023). Learning Transductions and Alignments with RNN Seq2seq Models. Proceedings of 16th edition of the International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 217:223-249 Available from https://proceedings.mlr.press/v217/wang23a.html.

Related Material