Vintix: Action Model via In-Context Reinforcement Learning

Andrei Polubarov, Lyubaykin Nikita, Alexander Derevyagin, Ilya Zisman, Denis Tarasov, Alexander Nikulin, Vladislav Kurenkov
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:49569-49602, 2025.

Abstract

In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-polubarov25a, title = {Vintix: Action Model via In-Context Reinforcement Learning}, author = {Polubarov, Andrei and Nikita, Lyubaykin and Derevyagin, Alexander and Zisman, Ilya and Tarasov, Denis and Nikulin, Alexander and Kurenkov, Vladislav}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {49569--49602}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/polubarov25a/polubarov25a.pdf}, url = {https://proceedings.mlr.press/v267/polubarov25a.html}, abstract = {In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems.} }
Endnote
%0 Conference Paper %T Vintix: Action Model via In-Context Reinforcement Learning %A Andrei Polubarov %A Lyubaykin Nikita %A Alexander Derevyagin %A Ilya Zisman %A Denis Tarasov %A Alexander Nikulin %A Vladislav Kurenkov %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-polubarov25a %I PMLR %P 49569--49602 %U https://proceedings.mlr.press/v267/polubarov25a.html %V 267 %X In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems.
APA
Polubarov, A., Nikita, L., Derevyagin, A., Zisman, I., Tarasov, D., Nikulin, A. & Kurenkov, V.. (2025). Vintix: Action Model via In-Context Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:49569-49602 Available from https://proceedings.mlr.press/v267/polubarov25a.html.

Related Material