Transformers as Algorithms: Generalization and Stability in In-context Learning

Yingcong Li; Muhammed Emrullah Ildiz; Dimitris Papailiopoulos; Samet Oymak

Transformers as Algorithms: Generalization and Stability in In-context Learning

Yingcong Li, Muhammed Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:19565-19594, 2023.

Abstract

In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer. We characterize when transformer/attention architecture provably obeys the stability condition and also provide empirical verification. For generalization on unseen tasks, we identify an inductive bias phenomenon in which the transfer learning risk is governed by the task complexity and the number of MTL tasks in a highly predictable manner. Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-li23l,
  title = 	 {Transformers as Algorithms: Generalization and Stability in In-context Learning},
  author =       {Li, Yingcong and Ildiz, Muhammed Emrullah and Papailiopoulos, Dimitris and Oymak, Samet},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {19565--19594},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/li23l/li23l.pdf},
  url = 	 {https://proceedings.mlr.press/v202/li23l.html},
  abstract = 	 {In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer. We characterize when transformer/attention architecture provably obeys the stability condition and also provide empirical verification. For generalization on unseen tasks, we identify an inductive bias phenomenon in which the transfer learning risk is governed by the task complexity and the number of MTL tasks in a highly predictable manner. Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions.}
}

Endnote

%0 Conference Paper
%T Transformers as Algorithms: Generalization and Stability in In-context Learning
%A Yingcong Li
%A Muhammed Emrullah Ildiz
%A Dimitris Papailiopoulos
%A Samet Oymak
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-li23l
%I PMLR
%P 19565--19594
%U https://proceedings.mlr.press/v202/li23l.html
%V 202
%X In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer. We characterize when transformer/attention architecture provably obeys the stability condition and also provide empirical verification. For generalization on unseen tasks, we identify an inductive bias phenomenon in which the transfer learning risk is governed by the task complexity and the number of MTL tasks in a highly predictable manner. Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions.

APA


Li, Y., Ildiz, M.E., Papailiopoulos, D. & Oymak, S.. (2023). Transformers as Algorithms: Generalization and Stability in In-context Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:19565-19594 Available from https://proceedings.mlr.press/v202/li23l.html.

Transformers as Algorithms: Generalization and Stability in In-context Learning

Abstract

Cite this Paper

Related Material