Online Decision Transformer

Qinqing Zheng; Amy Zhang; Aditya Grover

Online Decision Transformer

Qinqing Zheng, Amy Zhang, Aditya Grover

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:27042-27059, 2022.

Abstract

Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via task-specific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-zheng22c,
  title = 	 {Online Decision Transformer},
  author =       {Zheng, Qinqing and Zhang, Amy and Grover, Aditya},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {27042--27059},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/zheng22c/zheng22c.pdf},
  url = 	 {https://proceedings.mlr.press/v162/zheng22c.html},
  abstract = 	 {Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via task-specific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.}
}

Endnote

%0 Conference Paper
%T Online Decision Transformer
%A Qinqing Zheng
%A Amy Zhang
%A Aditya Grover
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-zheng22c
%I PMLR
%P 27042--27059
%U https://proceedings.mlr.press/v162/zheng22c.html
%V 162
%X Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via task-specific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.

APA


Zheng, Q., Zhang, A. & Grover, A.. (2022). Online Decision Transformer. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:27042-27059 Available from https://proceedings.mlr.press/v162/zheng22c.html.

Related Material

Download PDF