In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

Sili Huang; Jifeng Hu; Hechang Chen; Lichao Sun; Bo Yang

In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

Sili Huang, Jifeng Hu, Hechang Chen, Lichao Sun, Bo Yang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:19871-19885, 2024.

Abstract

In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates, current works still suffer from high computational costs when the across-episodic sequence increases with task horizons. To this end, we propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. Specifically, IDT is inspired by the efficient hierarchical structure of human decision-making and thus reconstructs the sequence to consist of high-level decisions instead of low-level actions that interact with environments. As one high-level decision can guide multi-step low-level actions, IDT naturally avoids excessively long sequences and solves online tasks more efficiently. Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is 36

$\times$ times faster than baselines in the D4RL benchmark and 27

$\times$ times faster in the Grid World benchmark.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-huang24j,
  title = 	 {In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought},
  author =       {Huang, Sili and Hu, Jifeng and Chen, Hechang and Sun, Lichao and Yang, Bo},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {19871--19885},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/huang24j/huang24j.pdf},
  url = 	 {https://proceedings.mlr.press/v235/huang24j.html},
  abstract = 	 {In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates, current works still suffer from high computational costs when the across-episodic sequence increases with task horizons. To this end, we propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. Specifically, IDT is inspired by the efficient hierarchical structure of human decision-making and thus reconstructs the sequence to consist of high-level decisions instead of low-level actions that interact with environments. As one high-level decision can guide multi-step low-level actions, IDT naturally avoids excessively long sequences and solves online tasks more efficiently. Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is 36$\times$ times faster than baselines in the D4RL benchmark and 27$\times$ times faster in the Grid World benchmark.}
}

Endnote

%0 Conference Paper
%T In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
%A Sili Huang
%A Jifeng Hu
%A Hechang Chen
%A Lichao Sun
%A Bo Yang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-huang24j
%I PMLR
%P 19871--19885
%U https://proceedings.mlr.press/v235/huang24j.html
%V 235
%X In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates, current works still suffer from high computational costs when the across-episodic sequence increases with task horizons. To this end, we propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. Specifically, IDT is inspired by the efficient hierarchical structure of human decision-making and thus reconstructs the sequence to consist of high-level decisions instead of low-level actions that interact with environments. As one high-level decision can guide multi-step low-level actions, IDT naturally avoids excessively long sequences and solves online tasks more efficiently. Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is 36$\times$ times faster than baselines in the D4RL benchmark and 27$\times$ times faster in the Grid World benchmark.

APA


Huang, S., Hu, J., Chen, H., Sun, L. & Yang, B.. (2024). In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:19871-19885 Available from https://proceedings.mlr.press/v235/huang24j.html.

In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

Abstract

Cite this Paper

Related Material