Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

Alireza Amiri Bavandpour, Xinting Huang, Mark Rofin, Michael Hahn
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:3274-3306, 2025.

Abstract

Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers’ expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-bavandpour25a, title = {Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers}, author = {Bavandpour, Alireza Amiri and Huang, Xinting and Rofin, Mark and Hahn, Michael}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {3274--3306}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/bavandpour25a/bavandpour25a.pdf}, url = {https://proceedings.mlr.press/v267/bavandpour25a.html}, abstract = {Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers’ expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.} }
Endnote
%0 Conference Paper %T Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers %A Alireza Amiri Bavandpour %A Xinting Huang %A Mark Rofin %A Michael Hahn %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-bavandpour25a %I PMLR %P 3274--3306 %U https://proceedings.mlr.press/v267/bavandpour25a.html %V 267 %X Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers’ expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.
APA
Bavandpour, A.A., Huang, X., Rofin, M. & Hahn, M.. (2025). Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:3274-3306 Available from https://proceedings.mlr.press/v267/bavandpour25a.html.

Related Material