A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

Raanan Yehezkel Rohekar, Yaniv Gurwicz, Sungduk Yu, Estelle Aflalo, Vasudev Lal
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:72196-72209, 2025.

Abstract

Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences, and introduce a corresponding confidence score. Empirical tests were conducted in controlled environments using the setups of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, was tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases where it generates illegal moves, it also fails to capture a causal structure.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yehezkel-rohekar25a, title = {A Causal World Model Underlying Next Token Prediction: Exploring {GPT} in a Controlled Environment}, author = {Yehezkel Rohekar, Raanan and Gurwicz, Yaniv and Yu, Sungduk and Aflalo, Estelle and Lal, Vasudev}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {72196--72209}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yehezkel-rohekar25a/yehezkel-rohekar25a.pdf}, url = {https://proceedings.mlr.press/v267/yehezkel-rohekar25a.html}, abstract = {Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences, and introduce a corresponding confidence score. Empirical tests were conducted in controlled environments using the setups of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, was tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases where it generates illegal moves, it also fails to capture a causal structure.} }
Endnote
%0 Conference Paper %T A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment %A Raanan Yehezkel Rohekar %A Yaniv Gurwicz %A Sungduk Yu %A Estelle Aflalo %A Vasudev Lal %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yehezkel-rohekar25a %I PMLR %P 72196--72209 %U https://proceedings.mlr.press/v267/yehezkel-rohekar25a.html %V 267 %X Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences, and introduce a corresponding confidence score. Empirical tests were conducted in controlled environments using the setups of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, was tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases where it generates illegal moves, it also fails to capture a causal structure.
APA
Yehezkel Rohekar, R., Gurwicz, Y., Yu, S., Aflalo, E. & Lal, V.. (2025). A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:72196-72209 Available from https://proceedings.mlr.press/v267/yehezkel-rohekar25a.html.

Related Material