On the Origins of Linear Representations in Large Language Models

Yibo Jiang, Goutham Rajendran, Pradeep Kumar Ravikumar, Bryon Aragam, Victor Veitch
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:21879-21911, 2024.

Abstract

An array of recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to prove that linearity arises as a consequence of the loss function and the implicit bias of gradient descent. The theory is further substantiated empirically via experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-jiang24d, title = {On the Origins of Linear Representations in Large Language Models}, author = {Jiang, Yibo and Rajendran, Goutham and Ravikumar, Pradeep Kumar and Aragam, Bryon and Veitch, Victor}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {21879--21911}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/jiang24d/jiang24d.pdf}, url = {https://proceedings.mlr.press/v235/jiang24d.html}, abstract = {An array of recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to prove that linearity arises as a consequence of the loss function and the implicit bias of gradient descent. The theory is further substantiated empirically via experiments.} }
Endnote
%0 Conference Paper %T On the Origins of Linear Representations in Large Language Models %A Yibo Jiang %A Goutham Rajendran %A Pradeep Kumar Ravikumar %A Bryon Aragam %A Victor Veitch %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-jiang24d %I PMLR %P 21879--21911 %U https://proceedings.mlr.press/v235/jiang24d.html %V 235 %X An array of recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to prove that linearity arises as a consequence of the loss function and the implicit bias of gradient descent. The theory is further substantiated empirically via experiments.
APA
Jiang, Y., Rajendran, G., Ravikumar, P.K., Aragam, B. & Veitch, V.. (2024). On the Origins of Linear Representations in Large Language Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:21879-21911 Available from https://proceedings.mlr.press/v235/jiang24d.html.

Related Material