Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Tengye Xu, Zihao Li, Qinyuan Ren
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:55112-55125, 2024.

Abstract

A key challenge in Meta-Reinforcement Learning (meta-RL) is the task distribution shift, since the generalization ability of most current meta-RL methods is limited to tasks sampled from the training distribution. In this paper, we propose Posterior Sampling Bayesian Lifelong In-Context Reinforcement Learning (PSBL), which is robust to task distribution shift. PSBL meta-trains a variant of transformer to directly perform amortized inference about the Predictive Posterior Distribution (PPD) of the optimal policy. Once trained, the network can infer the PPD online with frozen parameters. The agent then samples actions from the approximate PPD to perform online exploration, which progressively reduces uncertainty and enhances performance in the interaction with the environment. This property is known as in-context learning. Experimental results demonstrate that PSBL significantly outperforms standard Meta RL methods both in tasks with sparse rewards and dense rewards when the test task distribution is strictly shifted from the training distribution.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-xu24o, title = {Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning}, author = {Xu, Tengye and Li, Zihao and Ren, Qinyuan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {55112--55125}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xu24o/xu24o.pdf}, url = {https://proceedings.mlr.press/v235/xu24o.html}, abstract = {A key challenge in Meta-Reinforcement Learning (meta-RL) is the task distribution shift, since the generalization ability of most current meta-RL methods is limited to tasks sampled from the training distribution. In this paper, we propose Posterior Sampling Bayesian Lifelong In-Context Reinforcement Learning (PSBL), which is robust to task distribution shift. PSBL meta-trains a variant of transformer to directly perform amortized inference about the Predictive Posterior Distribution (PPD) of the optimal policy. Once trained, the network can infer the PPD online with frozen parameters. The agent then samples actions from the approximate PPD to perform online exploration, which progressively reduces uncertainty and enhances performance in the interaction with the environment. This property is known as in-context learning. Experimental results demonstrate that PSBL significantly outperforms standard Meta RL methods both in tasks with sparse rewards and dense rewards when the test task distribution is strictly shifted from the training distribution.} }
Endnote
%0 Conference Paper %T Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning %A Tengye Xu %A Zihao Li %A Qinyuan Ren %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-xu24o %I PMLR %P 55112--55125 %U https://proceedings.mlr.press/v235/xu24o.html %V 235 %X A key challenge in Meta-Reinforcement Learning (meta-RL) is the task distribution shift, since the generalization ability of most current meta-RL methods is limited to tasks sampled from the training distribution. In this paper, we propose Posterior Sampling Bayesian Lifelong In-Context Reinforcement Learning (PSBL), which is robust to task distribution shift. PSBL meta-trains a variant of transformer to directly perform amortized inference about the Predictive Posterior Distribution (PPD) of the optimal policy. Once trained, the network can infer the PPD online with frozen parameters. The agent then samples actions from the approximate PPD to perform online exploration, which progressively reduces uncertainty and enhances performance in the interaction with the environment. This property is known as in-context learning. Experimental results demonstrate that PSBL significantly outperforms standard Meta RL methods both in tasks with sparse rewards and dense rewards when the test task distribution is strictly shifted from the training distribution.
APA
Xu, T., Li, Z. & Ren, Q.. (2024). Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:55112-55125 Available from https://proceedings.mlr.press/v235/xu24o.html.

Related Material