A trajectory is worth three sentences: multimodal transformer for offline reinforcement learning

Yiqi Wang, Mengdi Xu, Laixi Shi, Yuejie Chi
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2226-2236, 2023.

Abstract

Transformers hold tremendous promise in solving offline reinforcement learning (RL) by formulating it as a sequence modeling problem inspired by language modeling (LM). Prior works using transformers model a sample (trajectory) of RL as one sequence analogous to a sequence of words (one sentence) in LM, despite the fact that each trajectory includes tokens from three diverse modalities: state, action, and reward, while a sentence contains words only. Rather than taking a modality-agnostic approach which uniformly models the tokens from different modalities as one sequence, we propose a multimodal sequence modeling approach in which a trajectory (one “sentence”) of three modalities (state, action, reward) is disentangled into three unimodal ones (three “sentences”). We investigate the correlation of different modalities during sequential decision-making and use the insights to design a multimodal transformer, named Decision Transducer (DTd). DTd outperforms prior art in offline RL on the conducted D4RL benchmarks and enjoys better sample efficiency and algorithm flexibility. Our code is made publicly here.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-wang23d, title = {A trajectory is worth three sentences: multimodal transformer for offline reinforcement learning}, author = {Wang, Yiqi and Xu, Mengdi and Shi, Laixi and Chi, Yuejie}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {2226--2236}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/wang23d/wang23d.pdf}, url = {https://proceedings.mlr.press/v216/wang23d.html}, abstract = {Transformers hold tremendous promise in solving offline reinforcement learning (RL) by formulating it as a sequence modeling problem inspired by language modeling (LM). Prior works using transformers model a sample (trajectory) of RL as one sequence analogous to a sequence of words (one sentence) in LM, despite the fact that each trajectory includes tokens from three diverse modalities: state, action, and reward, while a sentence contains words only. Rather than taking a modality-agnostic approach which uniformly models the tokens from different modalities as one sequence, we propose a multimodal sequence modeling approach in which a trajectory (one “sentence”) of three modalities (state, action, reward) is disentangled into three unimodal ones (three “sentences”). We investigate the correlation of different modalities during sequential decision-making and use the insights to design a multimodal transformer, named Decision Transducer (DTd). DTd outperforms prior art in offline RL on the conducted D4RL benchmarks and enjoys better sample efficiency and algorithm flexibility. Our code is made publicly here.} }
Endnote
%0 Conference Paper %T A trajectory is worth three sentences: multimodal transformer for offline reinforcement learning %A Yiqi Wang %A Mengdi Xu %A Laixi Shi %A Yuejie Chi %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-wang23d %I PMLR %P 2226--2236 %U https://proceedings.mlr.press/v216/wang23d.html %V 216 %X Transformers hold tremendous promise in solving offline reinforcement learning (RL) by formulating it as a sequence modeling problem inspired by language modeling (LM). Prior works using transformers model a sample (trajectory) of RL as one sequence analogous to a sequence of words (one sentence) in LM, despite the fact that each trajectory includes tokens from three diverse modalities: state, action, and reward, while a sentence contains words only. Rather than taking a modality-agnostic approach which uniformly models the tokens from different modalities as one sequence, we propose a multimodal sequence modeling approach in which a trajectory (one “sentence”) of three modalities (state, action, reward) is disentangled into three unimodal ones (three “sentences”). We investigate the correlation of different modalities during sequential decision-making and use the insights to design a multimodal transformer, named Decision Transducer (DTd). DTd outperforms prior art in offline RL on the conducted D4RL benchmarks and enjoys better sample efficiency and algorithm flexibility. Our code is made publicly here.
APA
Wang, Y., Xu, M., Shi, L. & Chi, Y.. (2023). A trajectory is worth three sentences: multimodal transformer for offline reinforcement learning. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2226-2236 Available from https://proceedings.mlr.press/v216/wang23d.html.

Related Material