On Understanding Attention-Based In-Context Learning for Categorical Data

Aaron T Wang, William Convertino, Xiang Cheng, Ricardo Henao, Lawrence Carin
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:62701-62728, 2025.

Abstract

In-context learning based on attention models is examined for data with categorical outcomes, with inference in such models viewed from the perspective of functional gradient descent (GD). We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections. This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations. We perform a theoretical analysis of this setup, generalizing many prior assumptions in this line of work, including the class of attention mechanisms for which it is appropriate. We demonstrate the framework empirically on synthetic data, image classification and language generation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wang25u, title = {On Understanding Attention-Based In-Context Learning for Categorical Data}, author = {Wang, Aaron T and Convertino, William and Cheng, Xiang and Henao, Ricardo and Carin, Lawrence}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {62701--62728}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25u/wang25u.pdf}, url = {https://proceedings.mlr.press/v267/wang25u.html}, abstract = {In-context learning based on attention models is examined for data with categorical outcomes, with inference in such models viewed from the perspective of functional gradient descent (GD). We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections. This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations. We perform a theoretical analysis of this setup, generalizing many prior assumptions in this line of work, including the class of attention mechanisms for which it is appropriate. We demonstrate the framework empirically on synthetic data, image classification and language generation.} }
Endnote
%0 Conference Paper %T On Understanding Attention-Based In-Context Learning for Categorical Data %A Aaron T Wang %A William Convertino %A Xiang Cheng %A Ricardo Henao %A Lawrence Carin %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wang25u %I PMLR %P 62701--62728 %U https://proceedings.mlr.press/v267/wang25u.html %V 267 %X In-context learning based on attention models is examined for data with categorical outcomes, with inference in such models viewed from the perspective of functional gradient descent (GD). We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections. This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations. We perform a theoretical analysis of this setup, generalizing many prior assumptions in this line of work, including the class of attention mechanisms for which it is appropriate. We demonstrate the framework empirically on synthetic data, image classification and language generation.
APA
Wang, A.T., Convertino, W., Cheng, X., Henao, R. & Carin, L.. (2025). On Understanding Attention-Based In-Context Learning for Categorical Data. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:62701-62728 Available from https://proceedings.mlr.press/v267/wang25u.html.

Related Material