Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:42074-42103, 2024.

Abstract

Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from small amounts of training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) biases in the order of the compositions in the training data, results in Transformers that fail to compose some combinations of functions; and (4) the attention layers seem to select the capability to apply while the feed-forward layers execute the capability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ramesh24a, title = {Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks}, author = {Ramesh, Rahul and Lubana, Ekdeep Singh and Khona, Mikail and Dick, Robert P. and Tanaka, Hidenori}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {42074--42103}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ramesh24a/ramesh24a.pdf}, url = {https://proceedings.mlr.press/v235/ramesh24a.html}, abstract = {Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from small amounts of training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) biases in the order of the compositions in the training data, results in Transformers that fail to compose some combinations of functions; and (4) the attention layers seem to select the capability to apply while the feed-forward layers execute the capability.} }
Endnote
%0 Conference Paper %T Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks %A Rahul Ramesh %A Ekdeep Singh Lubana %A Mikail Khona %A Robert P. Dick %A Hidenori Tanaka %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ramesh24a %I PMLR %P 42074--42103 %U https://proceedings.mlr.press/v235/ramesh24a.html %V 235 %X Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from small amounts of training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) biases in the order of the compositions in the training data, results in Transformers that fail to compose some combinations of functions; and (4) the attention layers seem to select the capability to apply while the feed-forward layers execute the capability.
APA
Ramesh, R., Lubana, E.S., Khona, M., Dick, R.P. & Tanaka, H.. (2024). Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:42074-42103 Available from https://proceedings.mlr.press/v235/ramesh24a.html.

Related Material