Tighter Bounds on the Expressivity of Transformer Encoders

David Chiang, Peter Cholak, Anand Pillay
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:5544-5562, 2023.

Abstract

Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chiang23a, title = {Tighter Bounds on the Expressivity of Transformer Encoders}, author = {Chiang, David and Cholak, Peter and Pillay, Anand}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {5544--5562}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chiang23a/chiang23a.pdf}, url = {https://proceedings.mlr.press/v202/chiang23a.html}, abstract = {Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.} }
Endnote
%0 Conference Paper %T Tighter Bounds on the Expressivity of Transformer Encoders %A David Chiang %A Peter Cholak %A Anand Pillay %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chiang23a %I PMLR %P 5544--5562 %U https://proceedings.mlr.press/v202/chiang23a.html %V 202 %X Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.
APA
Chiang, D., Cholak, P. & Pillay, A.. (2023). Tighter Bounds on the Expressivity of Transformer Encoders. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:5544-5562 Available from https://proceedings.mlr.press/v202/chiang23a.html.

Related Material