Can Transformers Reason Logically? A Study in SAT Solving

Leyan Pan, Vijay Ganesh, Jacob Abernethy, Chris Esposo, Wenke Lee
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:47632-47671, 2025.

Abstract

We formally study the logical reasoning capabilities of decoder-only Transformers in the context of the boolean satisfiability (SAT) problem. First, we prove by construction that decoder-only Transformers can decide 3-SAT, in a non-uniform model of computation, using backtracking and deduction via Chain-of-Thought (CoT). Second, we implement our construction as a PyTorch model with a tool (PARAT) that we designed to empirically demonstrate its correctness and investigate its properties. Third, rather than programming a transformer to reason, we evaluate empirically whether it can be trained to do so by learning directly from algorithmic traces (“reasoning paths”) from our theoretical construction. The trained models demonstrate strong out-of-distribution generalization on problem sizes seen during training but has limited length generalization, which is consistent with the implications of our theoretical result.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-pan25d, title = {Can Transformers Reason Logically? {A} Study in {SAT} Solving}, author = {Pan, Leyan and Ganesh, Vijay and Abernethy, Jacob and Esposo, Chris and Lee, Wenke}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {47632--47671}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/pan25d/pan25d.pdf}, url = {https://proceedings.mlr.press/v267/pan25d.html}, abstract = {We formally study the logical reasoning capabilities of decoder-only Transformers in the context of the boolean satisfiability (SAT) problem. First, we prove by construction that decoder-only Transformers can decide 3-SAT, in a non-uniform model of computation, using backtracking and deduction via Chain-of-Thought (CoT). Second, we implement our construction as a PyTorch model with a tool (PARAT) that we designed to empirically demonstrate its correctness and investigate its properties. Third, rather than programming a transformer to reason, we evaluate empirically whether it can be trained to do so by learning directly from algorithmic traces (“reasoning paths”) from our theoretical construction. The trained models demonstrate strong out-of-distribution generalization on problem sizes seen during training but has limited length generalization, which is consistent with the implications of our theoretical result.} }
Endnote
%0 Conference Paper %T Can Transformers Reason Logically? A Study in SAT Solving %A Leyan Pan %A Vijay Ganesh %A Jacob Abernethy %A Chris Esposo %A Wenke Lee %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-pan25d %I PMLR %P 47632--47671 %U https://proceedings.mlr.press/v267/pan25d.html %V 267 %X We formally study the logical reasoning capabilities of decoder-only Transformers in the context of the boolean satisfiability (SAT) problem. First, we prove by construction that decoder-only Transformers can decide 3-SAT, in a non-uniform model of computation, using backtracking and deduction via Chain-of-Thought (CoT). Second, we implement our construction as a PyTorch model with a tool (PARAT) that we designed to empirically demonstrate its correctness and investigate its properties. Third, rather than programming a transformer to reason, we evaluate empirically whether it can be trained to do so by learning directly from algorithmic traces (“reasoning paths”) from our theoretical construction. The trained models demonstrate strong out-of-distribution generalization on problem sizes seen during training but has limited length generalization, which is consistent with the implications of our theoretical result.
APA
Pan, L., Ganesh, V., Abernethy, J., Esposo, C. & Lee, W.. (2025). Can Transformers Reason Logically? A Study in SAT Solving. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:47632-47671 Available from https://proceedings.mlr.press/v267/pan25d.html.

Related Material