Which Attention Heads Matter for In-Context Learning?

Kayo Yin, Jacob Steinhardt
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:72428-72461, 2025.

Abstract

Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to generate relevant responses from a handful of task demonstrations in the prompt. Prior studies have suggested two different explanations for the mechanisms behind ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we find that few-shot ICL is driven primarily by FV heads, especially in larger models. We also find that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism for ICL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yin25e, title = {Which Attention Heads Matter for In-Context Learning?}, author = {Yin, Kayo and Steinhardt, Jacob}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {72428--72461}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yin25e/yin25e.pdf}, url = {https://proceedings.mlr.press/v267/yin25e.html}, abstract = {Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to generate relevant responses from a handful of task demonstrations in the prompt. Prior studies have suggested two different explanations for the mechanisms behind ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we find that few-shot ICL is driven primarily by FV heads, especially in larger models. We also find that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism for ICL.} }
Endnote
%0 Conference Paper %T Which Attention Heads Matter for In-Context Learning? %A Kayo Yin %A Jacob Steinhardt %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yin25e %I PMLR %P 72428--72461 %U https://proceedings.mlr.press/v267/yin25e.html %V 267 %X Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to generate relevant responses from a handful of task demonstrations in the prompt. Prior studies have suggested two different explanations for the mechanisms behind ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we find that few-shot ICL is driven primarily by FV heads, especially in larger models. We also find that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism for ICL.
APA
Yin, K. & Steinhardt, J.. (2025). Which Attention Heads Matter for In-Context Learning?. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:72428-72461 Available from https://proceedings.mlr.press/v267/yin25e.html.

Related Material