Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

Leon Götz, Marcel Kollovieh, Stephan Günnemann, Leo Schwinn
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:20226-20246, 2025.

Abstract

Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-gotz25a, title = {Efficient Time Series Processing for Transformers and State-Space Models through Token Merging}, author = {G\"{o}tz, Leon and Kollovieh, Marcel and G\"{u}nnemann, Stephan and Schwinn, Leo}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {20226--20246}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gotz25a/gotz25a.pdf}, url = {https://proceedings.mlr.press/v267/gotz25a.html}, abstract = {Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.} }
Endnote
%0 Conference Paper %T Efficient Time Series Processing for Transformers and State-Space Models through Token Merging %A Leon Götz %A Marcel Kollovieh %A Stephan Günnemann %A Leo Schwinn %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-gotz25a %I PMLR %P 20226--20246 %U https://proceedings.mlr.press/v267/gotz25a.html %V 267 %X Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.
APA
Götz, L., Kollovieh, M., Günnemann, S. & Schwinn, L.. (2025). Efficient Time Series Processing for Transformers and State-Space Models through Token Merging. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:20226-20246 Available from https://proceedings.mlr.press/v267/gotz25a.html.

Related Material