On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

Kevin Xu; Issei Sato

On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

Kevin Xu, Issei Sato

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:69613-69646, 2025.

Abstract

Looped Transformers provide advantages in parameter efficiency, computational capabilities, and generalization for reasoning tasks. However, their expressive power regarding function approximation remains underexplored. In this paper, we establish the approximation rate of Looped Transformers by defining the modulus of continuity for sequence-to-sequence functions. This reveals a limitation specific to the looped architecture. That is, the analysis prompts the incorporation of scaling parameters for each loop, conditioned on timestep encoding. Experiments validate the theoretical results, showing that increasing the number of loops enhances performance, with further gains achieved through the timestep encoding.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-xu25x,
  title = 	 {On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding},
  author =       {Xu, Kevin and Sato, Issei},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {69613--69646},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xu25x/xu25x.pdf},
  url = 	 {https://proceedings.mlr.press/v267/xu25x.html},
  abstract = 	 {Looped Transformers provide advantages in parameter efficiency, computational capabilities, and generalization for reasoning tasks. However, their expressive power regarding function approximation remains underexplored. In this paper, we establish the approximation rate of Looped Transformers by defining the modulus of continuity for sequence-to-sequence functions. This reveals a limitation specific to the looped architecture. That is, the analysis prompts the incorporation of scaling parameters for each loop, conditioned on timestep encoding. Experiments validate the theoretical results, showing that increasing the number of loops enhances performance, with further gains achieved through the timestep encoding.}
}

Endnote

%0 Conference Paper
%T On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
%A Kevin Xu
%A Issei Sato
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-xu25x
%I PMLR
%P 69613--69646
%U https://proceedings.mlr.press/v267/xu25x.html
%V 267
%X Looped Transformers provide advantages in parameter efficiency, computational capabilities, and generalization for reasoning tasks. However, their expressive power regarding function approximation remains underexplored. In this paper, we establish the approximation rate of Looped Transformers by defining the modulus of continuity for sequence-to-sequence functions. This reveals a limitation specific to the looped architecture. That is, the analysis prompts the incorporation of scaling parameters for each loop, conditioned on timestep encoding. Experiments validate the theoretical results, showing that increasing the number of loops enhances performance, with further gains achieved through the timestep encoding.

APA

Xu, K. & Sato, I.. (2025). On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:69613-69646 Available from https://proceedings.mlr.press/v267/xu25x.html.

On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

Abstract

Cite this Paper

Related Material