Ad Hoc Teamwork via Offline Goal-Based Decision Transformers

Xinzhi Zhang, Hohei Chan, Deheng Ye, Yi Cai, Mengchen Zhao
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:74602-74616, 2025.

Abstract

The ability of agents to collaborate with previously unknown teammates on the fly, known as ad hoc teamwork (AHT), is crucial in many real-world applications. Existing approaches to AHT require online interactions with the environment and some carefully designed teammates. However, these prerequisites can be infeasible in practice. In this work, we extend the AHT problem to the offline setting, where the policy of the ego agent is directly learned from a multi-agent interaction dataset. We propose a hierarchical sequence modeling framework called TAGET that addresses critical challenges in the offline setting, including limited data, partial observability and online adaptation. The core idea of TAGET is to dynamically predict teammate-aware rewards-to-go and sub-goals, so that the ego agent can adapt to the changes of teammates’ behaviors in real time. Extensive experimental results show that TAGET significantly outperforms existing solutions to AHT in the offline setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhang25h, title = {Ad Hoc Teamwork via Offline Goal-Based Decision Transformers}, author = {Zhang, Xinzhi and Chan, Hohei and Ye, Deheng and Cai, Yi and Zhao, Mengchen}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {74602--74616}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25h/zhang25h.pdf}, url = {https://proceedings.mlr.press/v267/zhang25h.html}, abstract = {The ability of agents to collaborate with previously unknown teammates on the fly, known as ad hoc teamwork (AHT), is crucial in many real-world applications. Existing approaches to AHT require online interactions with the environment and some carefully designed teammates. However, these prerequisites can be infeasible in practice. In this work, we extend the AHT problem to the offline setting, where the policy of the ego agent is directly learned from a multi-agent interaction dataset. We propose a hierarchical sequence modeling framework called TAGET that addresses critical challenges in the offline setting, including limited data, partial observability and online adaptation. The core idea of TAGET is to dynamically predict teammate-aware rewards-to-go and sub-goals, so that the ego agent can adapt to the changes of teammates’ behaviors in real time. Extensive experimental results show that TAGET significantly outperforms existing solutions to AHT in the offline setting.} }
Endnote
%0 Conference Paper %T Ad Hoc Teamwork via Offline Goal-Based Decision Transformers %A Xinzhi Zhang %A Hohei Chan %A Deheng Ye %A Yi Cai %A Mengchen Zhao %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhang25h %I PMLR %P 74602--74616 %U https://proceedings.mlr.press/v267/zhang25h.html %V 267 %X The ability of agents to collaborate with previously unknown teammates on the fly, known as ad hoc teamwork (AHT), is crucial in many real-world applications. Existing approaches to AHT require online interactions with the environment and some carefully designed teammates. However, these prerequisites can be infeasible in practice. In this work, we extend the AHT problem to the offline setting, where the policy of the ego agent is directly learned from a multi-agent interaction dataset. We propose a hierarchical sequence modeling framework called TAGET that addresses critical challenges in the offline setting, including limited data, partial observability and online adaptation. The core idea of TAGET is to dynamically predict teammate-aware rewards-to-go and sub-goals, so that the ego agent can adapt to the changes of teammates’ behaviors in real time. Extensive experimental results show that TAGET significantly outperforms existing solutions to AHT in the offline setting.
APA
Zhang, X., Chan, H., Ye, D., Cai, Y. & Zhao, M.. (2025). Ad Hoc Teamwork via Offline Goal-Based Decision Transformers. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:74602-74616 Available from https://proceedings.mlr.press/v267/zhang25h.html.

Related Material