Contrastive Decision Transformers

Sachin G. Konan, Esmaeil Seraj, Matthew Gombolay
Proceedings of The 6th Conference on Robot Learning, PMLR 205:2159-2169, 2023.

Abstract

Decision Transformers (DT) have drawn upon the success of Transformers by abstracting Reinforcement Learning as a target-return-conditioned, sequence modeling problem. In our work, we claim that the distribution of DT’s target-returns represents a series of different tasks that agents must learn to handle. Work in multi-task learning has shown that separating the representations of input data belonging to different tasks can improve performance. We draw from this approach to construct ConDT (Contrastive Decision Transformer). ConDT leverages an enhanced contrastive loss to train a return-dependent transformation of the input embeddings, which we empirically show clusters these embeddings by their return. We find that ConDT significantly outperforms DT in Open-AI Gym domains by 10% and 39% in visually challenging Atari domains.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-konan23a, title = {Contrastive Decision Transformers}, author = {Konan, Sachin G. and Seraj, Esmaeil and Gombolay, Matthew}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {2159--2169}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/konan23a/konan23a.pdf}, url = {https://proceedings.mlr.press/v205/konan23a.html}, abstract = {Decision Transformers (DT) have drawn upon the success of Transformers by abstracting Reinforcement Learning as a target-return-conditioned, sequence modeling problem. In our work, we claim that the distribution of DT’s target-returns represents a series of different tasks that agents must learn to handle. Work in multi-task learning has shown that separating the representations of input data belonging to different tasks can improve performance. We draw from this approach to construct ConDT (Contrastive Decision Transformer). ConDT leverages an enhanced contrastive loss to train a return-dependent transformation of the input embeddings, which we empirically show clusters these embeddings by their return. We find that ConDT significantly outperforms DT in Open-AI Gym domains by 10% and 39% in visually challenging Atari domains.} }
Endnote
%0 Conference Paper %T Contrastive Decision Transformers %A Sachin G. Konan %A Esmaeil Seraj %A Matthew Gombolay %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-konan23a %I PMLR %P 2159--2169 %U https://proceedings.mlr.press/v205/konan23a.html %V 205 %X Decision Transformers (DT) have drawn upon the success of Transformers by abstracting Reinforcement Learning as a target-return-conditioned, sequence modeling problem. In our work, we claim that the distribution of DT’s target-returns represents a series of different tasks that agents must learn to handle. Work in multi-task learning has shown that separating the representations of input data belonging to different tasks can improve performance. We draw from this approach to construct ConDT (Contrastive Decision Transformer). ConDT leverages an enhanced contrastive loss to train a return-dependent transformation of the input embeddings, which we empirically show clusters these embeddings by their return. We find that ConDT significantly outperforms DT in Open-AI Gym domains by 10% and 39% in visually challenging Atari domains.
APA
Konan, S.G., Seraj, E. & Gombolay, M.. (2023). Contrastive Decision Transformers. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:2159-2169 Available from https://proceedings.mlr.press/v205/konan23a.html.

Related Material