Provable Length Generalization in Sequence Prediction via Spectral Filtering

Annie Marsden, Evan Dogariu, Naman Agarwal, Xinyi Chen, Daniel Suo, Elad Hazan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:43200-43224, 2025.

Abstract

We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting – the Asymmetric-Regret– which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filter-ing algorithm. We present a gradient-based learn-ing algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-marsden25a, title = {Provable Length Generalization in Sequence Prediction via Spectral Filtering}, author = {Marsden, Annie and Dogariu, Evan and Agarwal, Naman and Chen, Xinyi and Suo, Daniel and Hazan, Elad}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {43200--43224}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/marsden25a/marsden25a.pdf}, url = {https://proceedings.mlr.press/v267/marsden25a.html}, abstract = {We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting – the Asymmetric-Regret– which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filter-ing algorithm. We present a gradient-based learn-ing algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.} }
Endnote
%0 Conference Paper %T Provable Length Generalization in Sequence Prediction via Spectral Filtering %A Annie Marsden %A Evan Dogariu %A Naman Agarwal %A Xinyi Chen %A Daniel Suo %A Elad Hazan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-marsden25a %I PMLR %P 43200--43224 %U https://proceedings.mlr.press/v267/marsden25a.html %V 267 %X We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting – the Asymmetric-Regret– which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filter-ing algorithm. We present a gradient-based learn-ing algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.
APA
Marsden, A., Dogariu, E., Agarwal, N., Chen, X., Suo, D. & Hazan, E.. (2025). Provable Length Generalization in Sequence Prediction via Spectral Filtering. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:43200-43224 Available from https://proceedings.mlr.press/v267/marsden25a.html.

Related Material