Emergence of In-Context Reinforcement Learning from Noise Distillation

Ilya Zisman, Vladislav Kurenkov, Alexander Nikulin, Viacheslav Sinii, Sergey Kolesnikov
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62832-62846, 2024.

Abstract

Recently, extensive studies in Reinforcement Learning have been carried out on the ability of transformers to adapt in-context to various environments and tasks. Current in-context RL methods are limited by their strict requirements for data, which needs to be generated by RL agents or labeled with actions from an optimal policy. In order to address this prevalent problem, we propose AD$^\varepsilon$, a new data acquisition approach that enables in-context Reinforcement Learning from noise-induced curriculum. We show that it is viable to construct a synthetic noise injection curriculum which helps to obtain learning histories. Moreover, we experimentally demonstrate that it is possible to alleviate the need for generation using optimal policies, with in-context RL still able to outperform the best suboptimal policy in a learning dataset by a 2x margin.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zisman24a, title = {Emergence of In-Context Reinforcement Learning from Noise Distillation}, author = {Zisman, Ilya and Kurenkov, Vladislav and Nikulin, Alexander and Sinii, Viacheslav and Kolesnikov, Sergey}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {62832--62846}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zisman24a/zisman24a.pdf}, url = {https://proceedings.mlr.press/v235/zisman24a.html}, abstract = {Recently, extensive studies in Reinforcement Learning have been carried out on the ability of transformers to adapt in-context to various environments and tasks. Current in-context RL methods are limited by their strict requirements for data, which needs to be generated by RL agents or labeled with actions from an optimal policy. In order to address this prevalent problem, we propose AD$^\varepsilon$, a new data acquisition approach that enables in-context Reinforcement Learning from noise-induced curriculum. We show that it is viable to construct a synthetic noise injection curriculum which helps to obtain learning histories. Moreover, we experimentally demonstrate that it is possible to alleviate the need for generation using optimal policies, with in-context RL still able to outperform the best suboptimal policy in a learning dataset by a 2x margin.} }
Endnote
%0 Conference Paper %T Emergence of In-Context Reinforcement Learning from Noise Distillation %A Ilya Zisman %A Vladislav Kurenkov %A Alexander Nikulin %A Viacheslav Sinii %A Sergey Kolesnikov %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zisman24a %I PMLR %P 62832--62846 %U https://proceedings.mlr.press/v235/zisman24a.html %V 235 %X Recently, extensive studies in Reinforcement Learning have been carried out on the ability of transformers to adapt in-context to various environments and tasks. Current in-context RL methods are limited by their strict requirements for data, which needs to be generated by RL agents or labeled with actions from an optimal policy. In order to address this prevalent problem, we propose AD$^\varepsilon$, a new data acquisition approach that enables in-context Reinforcement Learning from noise-induced curriculum. We show that it is viable to construct a synthetic noise injection curriculum which helps to obtain learning histories. Moreover, we experimentally demonstrate that it is possible to alleviate the need for generation using optimal policies, with in-context RL still able to outperform the best suboptimal policy in a learning dataset by a 2x margin.
APA
Zisman, I., Kurenkov, V., Nikulin, A., Sinii, V. & Kolesnikov, S.. (2024). Emergence of In-Context Reinforcement Learning from Noise Distillation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:62832-62846 Available from https://proceedings.mlr.press/v235/zisman24a.html.

Related Material