Looped Transformers as Programmable Computers

Angeliki Giannou, Shashank Rajput, Jy-Yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:11398-11442, 2023.

Abstract

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including lexicographic operations, non-linear functions, function calls, program counters, and conditional branches. Using this framework, we emulate a computer using a simple instruction-set architecture, which allows us to map iterative algorithms to programs that can be executed by a constant depth looped transformer network. We show how a single frozen transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and even a full backpropagation, in-context learning algorithm. Our findings reveal the potential of transformer networks as programmable compute units and offer insight into the mechanics of attention.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-giannou23a, title = {Looped Transformers as Programmable Computers}, author = {Giannou, Angeliki and Rajput, Shashank and Sohn, Jy-Yong and Lee, Kangwook and Lee, Jason D. and Papailiopoulos, Dimitris}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {11398--11442}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/giannou23a/giannou23a.pdf}, url = {https://proceedings.mlr.press/v202/giannou23a.html}, abstract = {We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including lexicographic operations, non-linear functions, function calls, program counters, and conditional branches. Using this framework, we emulate a computer using a simple instruction-set architecture, which allows us to map iterative algorithms to programs that can be executed by a constant depth looped transformer network. We show how a single frozen transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and even a full backpropagation, in-context learning algorithm. Our findings reveal the potential of transformer networks as programmable compute units and offer insight into the mechanics of attention.} }
Endnote
%0 Conference Paper %T Looped Transformers as Programmable Computers %A Angeliki Giannou %A Shashank Rajput %A Jy-Yong Sohn %A Kangwook Lee %A Jason D. Lee %A Dimitris Papailiopoulos %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-giannou23a %I PMLR %P 11398--11442 %U https://proceedings.mlr.press/v202/giannou23a.html %V 202 %X We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including lexicographic operations, non-linear functions, function calls, program counters, and conditional branches. Using this framework, we emulate a computer using a simple instruction-set architecture, which allows us to map iterative algorithms to programs that can be executed by a constant depth looped transformer network. We show how a single frozen transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and even a full backpropagation, in-context learning algorithm. Our findings reveal the potential of transformer networks as programmable compute units and offer insight into the mechanics of attention.
APA
Giannou, A., Rajput, S., Sohn, J., Lee, K., Lee, J.D. & Papailiopoulos, D.. (2023). Looped Transformers as Programmable Computers. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:11398-11442 Available from https://proceedings.mlr.press/v202/giannou23a.html.

Related Material