Prompting Decision Transformer for Few-Shot Policy Generalization

Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua Tenenbaum, Chuang Gan
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:24631-24645, 2022.

Abstract

Human can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments. Project page: \href{https://mxu34.github.io/PromptDT/}{https://mxu34.github.io/PromptDT/}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-xu22g, title = {Prompting Decision Transformer for Few-Shot Policy Generalization}, author = {Xu, Mengdi and Shen, Yikang and Zhang, Shun and Lu, Yuchen and Zhao, Ding and Tenenbaum, Joshua and Gan, Chuang}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {24631--24645}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/xu22g/xu22g.pdf}, url = {https://proceedings.mlr.press/v162/xu22g.html}, abstract = {Human can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments. Project page: \href{https://mxu34.github.io/PromptDT/}{https://mxu34.github.io/PromptDT/}.} }
Endnote
%0 Conference Paper %T Prompting Decision Transformer for Few-Shot Policy Generalization %A Mengdi Xu %A Yikang Shen %A Shun Zhang %A Yuchen Lu %A Ding Zhao %A Joshua Tenenbaum %A Chuang Gan %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-xu22g %I PMLR %P 24631--24645 %U https://proceedings.mlr.press/v162/xu22g.html %V 162 %X Human can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments. Project page: \href{https://mxu34.github.io/PromptDT/}{https://mxu34.github.io/PromptDT/}.
APA
Xu, M., Shen, Y., Zhang, S., Lu, Y., Zhao, D., Tenenbaum, J. & Gan, C.. (2022). Prompting Decision Transformer for Few-Shot Policy Generalization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:24631-24645 Available from https://proceedings.mlr.press/v162/xu22g.html.

Related Material