[edit]
Provable Length Generalization in Sequence Prediction via Spectral Filtering
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:43200-43224, 2025.
Abstract
We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting – the Asymmetric-Regret– which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filter-ing algorithm. We present a gradient-based learn-ing algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.