Linearly Structured World Representations in Maze-Solving Transformers

Michael Ivanitskiy; Alexander F. Spies; Tilman Räuker; Guillaume Corlouer; Christopher Mathwin; Lucia Quirke; Can Rager; Rusheb Shah; Dan Valentine; Cecilia Diniz Behn; Katsumi Inoue; Samy Wu Fung

Linearly Structured World Representations in Maze-Solving Transformers

Michael Ivanitskiy, Alexander F. Spies, Tilman Räuker, Guillaume Corlouer, Christopher Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung

Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, PMLR 243:133-143, 2024.

Abstract

The emergence of seemingly similar representations across tasks and neural architectures suggests that convergent properties may underlie sophisticated behavior. One form of representation that seems particularly fundamental to reasoning in many artificial (and perhaps natural) networks is the formation of world models, which decompose observed task structures into re-usable perceptual primitives and task-relevant relations. In this work, we show that auto-regressive transformers tasked with solving mazes learn to linearly represent the structure of mazes, and that the formation of these representations coincides with a sharp increase in generalization performance. Furthermore, we find preliminary evidence for Adjacency Heads which may play a role in computing valid paths through mazes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v243-ivanitskiy24a,
  title = 	 {Linearly Structured World Representations in Maze-Solving Transformers},
  author =       {Ivanitskiy, Michael and Spies, Alexander F. and R\"auker, Tilman and Corlouer, Guillaume and Mathwin, Christopher and Quirke, Lucia and Rager, Can and Shah, Rusheb and Valentine, Dan and Behn, Cecilia Diniz and Inoue, Katsumi and Fung, Samy Wu},
  booktitle = 	 {Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models},
  pages = 	 {133--143},
  year = 	 {2024},
  editor = 	 {Fumero, Marco and Rodolá, Emanuele and Domine, Clementine and Locatello, Francesco and Dziugaite, Karolina and Mathilde, Caron},
  volume = 	 {243},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v243/ivanitskiy24a/ivanitskiy24a.pdf},
  url = 	 {https://proceedings.mlr.press/v243/ivanitskiy24a.html},
  abstract = 	 {The emergence of seemingly similar representations across tasks and neural architectures suggests that convergent properties may underlie sophisticated behavior. One form of representation that seems particularly fundamental to reasoning in many artificial (and perhaps natural) networks is the formation of world models, which decompose observed task structures into re-usable perceptual primitives and task-relevant relations. In this work, we show that auto-regressive transformers tasked with solving mazes learn to linearly represent the structure of mazes, and that the formation of these representations coincides with a sharp increase in generalization performance. Furthermore, we find preliminary evidence for Adjacency Heads which may play a role in computing valid paths through mazes.}
}

Endnote

%0 Conference Paper
%T Linearly Structured World Representations in Maze-Solving Transformers
%A Michael Ivanitskiy
%A Alexander F. Spies
%A Tilman Räuker
%A Guillaume Corlouer
%A Christopher Mathwin
%A Lucia Quirke
%A Can Rager
%A Rusheb Shah
%A Dan Valentine
%A Cecilia Diniz Behn
%A Katsumi Inoue
%A Samy Wu Fung
%B Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models
%C Proceedings of Machine Learning Research
%D 2024
%E Marco Fumero
%E Emanuele Rodolá
%E Clementine Domine
%E Francesco Locatello
%E Karolina Dziugaite
%E Caron Mathilde	
%F pmlr-v243-ivanitskiy24a
%I PMLR
%P 133--143
%U https://proceedings.mlr.press/v243/ivanitskiy24a.html
%V 243
%X The emergence of seemingly similar representations across tasks and neural architectures suggests that convergent properties may underlie sophisticated behavior. One form of representation that seems particularly fundamental to reasoning in many artificial (and perhaps natural) networks is the formation of world models, which decompose observed task structures into re-usable perceptual primitives and task-relevant relations. In this work, we show that auto-regressive transformers tasked with solving mazes learn to linearly represent the structure of mazes, and that the formation of these representations coincides with a sharp increase in generalization performance. Furthermore, we find preliminary evidence for Adjacency Heads which may play a role in computing valid paths through mazes.

APA


Ivanitskiy, M., Spies, A.F., Räuker, T., Corlouer, G., Mathwin, C., Quirke, L., Rager, C., Shah, R., Valentine, D., Behn, C.D., Inoue, K. & Fung, S.W.. (2024). Linearly Structured World Representations in Maze-Solving Transformers. Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 243:133-143 Available from https://proceedings.mlr.press/v243/ivanitskiy24a.html.

Linearly Structured World Representations in Maze-Solving Transformers

Abstract

Cite this Paper

Related Material