Provable Benefits of Task-Specific Prompts for In-context Learning

Xiangyu Chang, Yingcong Li, Muti Kara, Samet Oymak, Amit Roy-Chowdhury
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1558-1566, 2025.

Abstract

The in-context learning capabilities of modern language models have motivated a deeper mathematical understanding of sequence models. A line of recent work has shown that linear attention models can emulate projected gradient descent iterations to implicitly learn the task vector from the data provided in the context window. In this work, we consider a novel setting where the global task distribution can be partitioned into a union of conditional task distributions. We then examine the use of task-specific prompts and prediction heads for learning the prior information associated with the conditional task distribution using a one-layer attention model. Our results on loss landscape show that task-specific prompts facilitate a covariance-mean decoupling where prompt-tuning explains the conditional mean of the distribution whereas the variance is learned/explained through in-context learning. Incorporating task-specific head further aids this process by entirely decoupling estimation of mean and variance components. This covariance-mean perspective similarly explains how jointly training prompt and attention weights can provably help over fine-tuning after pretraining.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-chang25b, title = {Provable Benefits of Task-Specific Prompts for In-context Learning}, author = {Chang, Xiangyu and Li, Yingcong and Kara, Muti and Oymak, Samet and Roy-Chowdhury, Amit}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1558--1566}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/chang25b/chang25b.pdf}, url = {https://proceedings.mlr.press/v258/chang25b.html}, abstract = {The in-context learning capabilities of modern language models have motivated a deeper mathematical understanding of sequence models. A line of recent work has shown that linear attention models can emulate projected gradient descent iterations to implicitly learn the task vector from the data provided in the context window. In this work, we consider a novel setting where the global task distribution can be partitioned into a union of conditional task distributions. We then examine the use of task-specific prompts and prediction heads for learning the prior information associated with the conditional task distribution using a one-layer attention model. Our results on loss landscape show that task-specific prompts facilitate a covariance-mean decoupling where prompt-tuning explains the conditional mean of the distribution whereas the variance is learned/explained through in-context learning. Incorporating task-specific head further aids this process by entirely decoupling estimation of mean and variance components. This covariance-mean perspective similarly explains how jointly training prompt and attention weights can provably help over fine-tuning after pretraining.} }
Endnote
%0 Conference Paper %T Provable Benefits of Task-Specific Prompts for In-context Learning %A Xiangyu Chang %A Yingcong Li %A Muti Kara %A Samet Oymak %A Amit Roy-Chowdhury %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-chang25b %I PMLR %P 1558--1566 %U https://proceedings.mlr.press/v258/chang25b.html %V 258 %X The in-context learning capabilities of modern language models have motivated a deeper mathematical understanding of sequence models. A line of recent work has shown that linear attention models can emulate projected gradient descent iterations to implicitly learn the task vector from the data provided in the context window. In this work, we consider a novel setting where the global task distribution can be partitioned into a union of conditional task distributions. We then examine the use of task-specific prompts and prediction heads for learning the prior information associated with the conditional task distribution using a one-layer attention model. Our results on loss landscape show that task-specific prompts facilitate a covariance-mean decoupling where prompt-tuning explains the conditional mean of the distribution whereas the variance is learned/explained through in-context learning. Incorporating task-specific head further aids this process by entirely decoupling estimation of mean and variance components. This covariance-mean perspective similarly explains how jointly training prompt and attention weights can provably help over fine-tuning after pretraining.
APA
Chang, X., Li, Y., Kara, M., Oymak, S. & Roy-Chowdhury, A.. (2025). Provable Benefits of Task-Specific Prompts for In-context Learning. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1558-1566 Available from https://proceedings.mlr.press/v258/chang25b.html.

Related Material