A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

Jacob Abernethy, Alekh Agarwal, Teodor Vanislavov Marinov, Manfred K. Warmuth
Proceedings of The 35th International Conference on Algorithmic Learning Theory, PMLR 237:3-46, 2024.

Abstract

We study the phenomenon of in-context learning (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization. Our goal is to explain how a pre-trained transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks. We posit a mechanism whereby a transformer can achieve the following: (a) receive an i.i.d. sequence of examples which have been converted into a prompt using potentially-ambiguous delimiters, (b) correctly segment the prompt into examples and labels, (c) infer from the data a sparse linear regressor hypothesis, and finally (d) apply this hypothesis on the given test example and return a predicted label. We establish that this entire procedure is implementable using the transformer mechanism, and we give sample complexity guarantees for this learning framework. Our empirical findings validate the challenge of segmentation, and we show a correspondence between our posited mechanisms and observed attention maps for step (c).

Cite this Paper


BibTeX
@InProceedings{pmlr-v237-abernethy24a, title = {A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks}, author = {Abernethy, Jacob and Agarwal, Alekh and Marinov, Teodor Vanislavov and Warmuth, Manfred K.}, booktitle = {Proceedings of The 35th International Conference on Algorithmic Learning Theory}, pages = {3--46}, year = {2024}, editor = {Vernade, Claire and Hsu, Daniel}, volume = {237}, series = {Proceedings of Machine Learning Research}, month = {25--28 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v237/abernethy24a/abernethy24a.pdf}, url = {https://proceedings.mlr.press/v237/abernethy24a.html}, abstract = {We study the phenomenon of in-context learning (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization. Our goal is to explain how a pre-trained transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks. We posit a mechanism whereby a transformer can achieve the following: (a) receive an i.i.d. sequence of examples which have been converted into a prompt using potentially-ambiguous delimiters, (b) correctly segment the prompt into examples and labels, (c) infer from the data a sparse linear regressor hypothesis, and finally (d) apply this hypothesis on the given test example and return a predicted label. We establish that this entire procedure is implementable using the transformer mechanism, and we give sample complexity guarantees for this learning framework. Our empirical findings validate the challenge of segmentation, and we show a correspondence between our posited mechanisms and observed attention maps for step (c).} }
Endnote
%0 Conference Paper %T A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks %A Jacob Abernethy %A Alekh Agarwal %A Teodor Vanislavov Marinov %A Manfred K. Warmuth %B Proceedings of The 35th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Claire Vernade %E Daniel Hsu %F pmlr-v237-abernethy24a %I PMLR %P 3--46 %U https://proceedings.mlr.press/v237/abernethy24a.html %V 237 %X We study the phenomenon of in-context learning (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization. Our goal is to explain how a pre-trained transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks. We posit a mechanism whereby a transformer can achieve the following: (a) receive an i.i.d. sequence of examples which have been converted into a prompt using potentially-ambiguous delimiters, (b) correctly segment the prompt into examples and labels, (c) infer from the data a sparse linear regressor hypothesis, and finally (d) apply this hypothesis on the given test example and return a predicted label. We establish that this entire procedure is implementable using the transformer mechanism, and we give sample complexity guarantees for this learning framework. Our empirical findings validate the challenge of segmentation, and we show a correspondence between our posited mechanisms and observed attention maps for step (c).
APA
Abernethy, J., Agarwal, A., Marinov, T.V. & Warmuth, M.K.. (2024). A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks. Proceedings of The 35th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 237:3-46 Available from https://proceedings.mlr.press/v237/abernethy24a.html.

Related Material