Constrained Belief Updates Explain Geometric Structures in Transformer Representations

Mateusz Piotrowski, Paul M. Riechers, Daniel Filan, Adam Shai
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:49399-49419, 2025.

Abstract

What computational structures emerge in transformers trained on next-token prediction? In this work, we provide evidence that transformers implement constrained Bayesian belief updating—a parallelized version of partial Bayesian inference shaped by architectural constraints. We integrate the model-agnostic theory of optimal prediction with mechanistic interpretability to analyze transformers trained on a tractable family of hidden Markov models that generate rich geometric patterns in neural activations. Our primary analysis focuses on single-layer transformers, revealing how the first attention layer implements these constrained updates, with extensions to multi-layer architectures demonstrating how subsequent layers refine these representations. We find that attention carries out an algorithm with a natural interpretation in the probability simplex, and create representations with distinctive geometric structure. We show how both the algorithmic behavior and the underlying geometry of these representations can be theoretically predicted in detail—including the attention pattern, OV-vectors, and embedding vectors—by modifying the equations for optimal future token predictions to account for the architectural constraints of attention. Our approach provides a principled lens on how architectural constraints shape the implementation of optimal prediction, revealing why transformers develop specific intermediate geometric structures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-piotrowski25a, title = {Constrained Belief Updates Explain Geometric Structures in Transformer Representations}, author = {Piotrowski, Mateusz and Riechers, Paul M. and Filan, Daniel and Shai, Adam}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {49399--49419}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/piotrowski25a/piotrowski25a.pdf}, url = {https://proceedings.mlr.press/v267/piotrowski25a.html}, abstract = {What computational structures emerge in transformers trained on next-token prediction? In this work, we provide evidence that transformers implement constrained Bayesian belief updating—a parallelized version of partial Bayesian inference shaped by architectural constraints. We integrate the model-agnostic theory of optimal prediction with mechanistic interpretability to analyze transformers trained on a tractable family of hidden Markov models that generate rich geometric patterns in neural activations. Our primary analysis focuses on single-layer transformers, revealing how the first attention layer implements these constrained updates, with extensions to multi-layer architectures demonstrating how subsequent layers refine these representations. We find that attention carries out an algorithm with a natural interpretation in the probability simplex, and create representations with distinctive geometric structure. We show how both the algorithmic behavior and the underlying geometry of these representations can be theoretically predicted in detail—including the attention pattern, OV-vectors, and embedding vectors—by modifying the equations for optimal future token predictions to account for the architectural constraints of attention. Our approach provides a principled lens on how architectural constraints shape the implementation of optimal prediction, revealing why transformers develop specific intermediate geometric structures.} }
Endnote
%0 Conference Paper %T Constrained Belief Updates Explain Geometric Structures in Transformer Representations %A Mateusz Piotrowski %A Paul M. Riechers %A Daniel Filan %A Adam Shai %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-piotrowski25a %I PMLR %P 49399--49419 %U https://proceedings.mlr.press/v267/piotrowski25a.html %V 267 %X What computational structures emerge in transformers trained on next-token prediction? In this work, we provide evidence that transformers implement constrained Bayesian belief updating—a parallelized version of partial Bayesian inference shaped by architectural constraints. We integrate the model-agnostic theory of optimal prediction with mechanistic interpretability to analyze transformers trained on a tractable family of hidden Markov models that generate rich geometric patterns in neural activations. Our primary analysis focuses on single-layer transformers, revealing how the first attention layer implements these constrained updates, with extensions to multi-layer architectures demonstrating how subsequent layers refine these representations. We find that attention carries out an algorithm with a natural interpretation in the probability simplex, and create representations with distinctive geometric structure. We show how both the algorithmic behavior and the underlying geometry of these representations can be theoretically predicted in detail—including the attention pattern, OV-vectors, and embedding vectors—by modifying the equations for optimal future token predictions to account for the architectural constraints of attention. Our approach provides a principled lens on how architectural constraints shape the implementation of optimal prediction, revealing why transformers develop specific intermediate geometric structures.
APA
Piotrowski, M., Riechers, P.M., Filan, D. & Shai, A.. (2025). Constrained Belief Updates Explain Geometric Structures in Transformer Representations. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:49399-49419 Available from https://proceedings.mlr.press/v267/piotrowski25a.html.

Related Material