Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps

Aditya Ravuri, Neil D Lawrence
Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, PMLR 322:159-167, 2026.

Abstract

We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform “linear” dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.

Cite this Paper


BibTeX
@InProceedings{pmlr-v322-ravuri26a, title = {Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps}, author = {Ravuri, Aditya and Lawrence, Neil D}, booktitle = {Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models}, pages = {159--167}, year = {2026}, editor = {Fumero, Marco and Domine, Clementine and L"ahner, Zorah and Cannistraci, Irene and Zhao, Bo and Williams, Alex}, volume = {322}, series = {Proceedings of Machine Learning Research}, month = {06 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v322/main/assets/ravuri26a/ravuri26a.pdf}, url = {https://proceedings.mlr.press/v322/ravuri26a.html}, abstract = {We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform “linear” dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.} }
Endnote
%0 Conference Paper %T Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps %A Aditya Ravuri %A Neil D Lawrence %B Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2026 %E Marco Fumero %E Clementine Domine %E Zorah L"ahner %E Irene Cannistraci %E Bo Zhao %E Alex Williams %F pmlr-v322-ravuri26a %I PMLR %P 159--167 %U https://proceedings.mlr.press/v322/ravuri26a.html %V 322 %X We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform “linear” dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.
APA
Ravuri, A. & Lawrence, N.D.. (2026). Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps. Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 322:159-167 Available from https://proceedings.mlr.press/v322/ravuri26a.html.

Related Material