Prompting a Pretrained Transformer Can Be a Universal Approximator

Aleksandar Petrov; Philip Torr; Adel Bibi

Prompting a Pretrained Transformer Can Be a Universal Approximator

Aleksandar Petrov, Philip Torr, Adel Bibi

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:40523-40550, 2024.

Abstract

Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of a pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, prefix-tuning a single attention head is sufficient to approximate any continuous function making the attention mechanism uniquely suited for universal approximation. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-petrov24a,
  title = 	 {Prompting a Pretrained Transformer Can Be a Universal Approximator},
  author =       {Petrov, Aleksandar and Torr, Philip and Bibi, Adel},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {40523--40550},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/petrov24a/petrov24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/petrov24a.html},
  abstract = 	 {Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of a pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, prefix-tuning a single attention head is sufficient to approximate any continuous function making the attention mechanism uniquely suited for universal approximation. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.}
}

Endnote

%0 Conference Paper
%T Prompting a Pretrained Transformer Can Be a Universal Approximator
%A Aleksandar Petrov
%A Philip Torr
%A Adel Bibi
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-petrov24a
%I PMLR
%P 40523--40550
%U https://proceedings.mlr.press/v235/petrov24a.html
%V 235
%X Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of a pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, prefix-tuning a single attention head is sufficient to approximate any continuous function making the attention mechanism uniquely suited for universal approximation. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.

APA


Petrov, A., Torr, P. & Bibi, A.. (2024). Prompting a Pretrained Transformer Can Be a Universal Approximator. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:40523-40550 Available from https://proceedings.mlr.press/v235/petrov24a.html.

Prompting a Pretrained Transformer Can Be a Universal Approximator

Abstract

Cite this Paper

Related Material