Statistical Bias Leads to Overestimated OOD Generalization in Algorithmic Tasks for Seq2Seq Transformer Models

John Kirk
Proceedings of The 4th Conference on Lifelong Learning Agents, PMLR 330:204-221, 2026.

Abstract

This study aims to understand how statistical bias affects the model’s ability to generalize to in-distribution and out-of-distribution data on algorithmic tasks. Prior research indicates that transformers may inadvertently learn to rely on these spurious correlations, leading to an overestimation of their generalization capabilities. To investigate this, we evaluated the seq2seq transformer models in several synthetic algorithmic tasks, systematically introducing and varying the presence of these biases. We also analyze how different architectural design choices of the transformer models affect their generalization. Our findings suggest that the presence of statistical biases can affect model performance in out-of-distribution data, leading to an overestimation of its generalization capabilities. The models rely heavily on these spurious correlations for inference, as indicated by their performance on tasks that include such biases.

Cite this Paper


BibTeX
@InProceedings{pmlr-v330-kirk26a, title = {Statistical Bias Leads to Overestimated OOD Generalization in Algorithmic Tasks for Seq2Seq Transformer Models}, author = {Kirk, John}, booktitle = {Proceedings of The 4th Conference on Lifelong Learning Agents}, pages = {204--221}, year = {2026}, editor = {Chandar, Sarath and Pascanu, Razvan and Eaton, Eric and Liu, Bing and Mahmood, Rupam and Rannen-Triki, Amal}, volume = {330}, series = {Proceedings of Machine Learning Research}, month = {11--14 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v330/main/assets/kirk26a/kirk26a.pdf}, url = {https://proceedings.mlr.press/v330/kirk26a.html}, abstract = {This study aims to understand how statistical bias affects the model’s ability to generalize to in-distribution and out-of-distribution data on algorithmic tasks. Prior research indicates that transformers may inadvertently learn to rely on these spurious correlations, leading to an overestimation of their generalization capabilities. To investigate this, we evaluated the seq2seq transformer models in several synthetic algorithmic tasks, systematically introducing and varying the presence of these biases. We also analyze how different architectural design choices of the transformer models affect their generalization. Our findings suggest that the presence of statistical biases can affect model performance in out-of-distribution data, leading to an overestimation of its generalization capabilities. The models rely heavily on these spurious correlations for inference, as indicated by their performance on tasks that include such biases.} }
Endnote
%0 Conference Paper %T Statistical Bias Leads to Overestimated OOD Generalization in Algorithmic Tasks for Seq2Seq Transformer Models %A John Kirk %B Proceedings of The 4th Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2026 %E Sarath Chandar %E Razvan Pascanu %E Eric Eaton %E Bing Liu %E Rupam Mahmood %E Amal Rannen-Triki %F pmlr-v330-kirk26a %I PMLR %P 204--221 %U https://proceedings.mlr.press/v330/kirk26a.html %V 330 %X This study aims to understand how statistical bias affects the model’s ability to generalize to in-distribution and out-of-distribution data on algorithmic tasks. Prior research indicates that transformers may inadvertently learn to rely on these spurious correlations, leading to an overestimation of their generalization capabilities. To investigate this, we evaluated the seq2seq transformer models in several synthetic algorithmic tasks, systematically introducing and varying the presence of these biases. We also analyze how different architectural design choices of the transformer models affect their generalization. Our findings suggest that the presence of statistical biases can affect model performance in out-of-distribution data, leading to an overestimation of its generalization capabilities. The models rely heavily on these spurious correlations for inference, as indicated by their performance on tasks that include such biases.
APA
Kirk, J.. (2026). Statistical Bias Leads to Overestimated OOD Generalization in Algorithmic Tasks for Seq2Seq Transformer Models. Proceedings of The 4th Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 330:204-221 Available from https://proceedings.mlr.press/v330/kirk26a.html.

Related Material