Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Yevgen Chebotar, Quan Vuong, Karol Hausman, Fei Xia, Yao Lu, Alex Irpan, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Anand Sontakke, Grecia Salazar, Huong T. Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singh, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, Sergey Levine
Proceedings of The 7th Conference on Robot Learning, PMLR 229:3909-3928, 2023.

Abstract

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-chebotar23a, title = {Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions}, author = {Chebotar, Yevgen and Vuong, Quan and Hausman, Karol and Xia, Fei and Lu, Yao and Irpan, Alex and Kumar, Aviral and Yu, Tianhe and Herzog, Alexander and Pertsch, Karl and Gopalakrishnan, Keerthana and Ibarz, Julian and Nachum, Ofir and Sontakke, Sumedh Anand and Salazar, Grecia and Tran, Huong T. and Peralta, Jodilyn and Tan, Clayton and Manjunath, Deeksha and Singh, Jaspiar and Zitkovich, Brianna and Jackson, Tomas and Rao, Kanishka and Finn, Chelsea and Levine, Sergey}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {3909--3928}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/chebotar23a/chebotar23a.pdf}, url = {https://proceedings.mlr.press/v229/chebotar23a.html}, abstract = {In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite.} }
Endnote
%0 Conference Paper %T Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions %A Yevgen Chebotar %A Quan Vuong %A Karol Hausman %A Fei Xia %A Yao Lu %A Alex Irpan %A Aviral Kumar %A Tianhe Yu %A Alexander Herzog %A Karl Pertsch %A Keerthana Gopalakrishnan %A Julian Ibarz %A Ofir Nachum %A Sumedh Anand Sontakke %A Grecia Salazar %A Huong T. Tran %A Jodilyn Peralta %A Clayton Tan %A Deeksha Manjunath %A Jaspiar Singh %A Brianna Zitkovich %A Tomas Jackson %A Kanishka Rao %A Chelsea Finn %A Sergey Levine %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-chebotar23a %I PMLR %P 3909--3928 %U https://proceedings.mlr.press/v229/chebotar23a.html %V 229 %X In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite.
APA
Chebotar, Y., Vuong, Q., Hausman, K., Xia, F., Lu, Y., Irpan, A., Kumar, A., Yu, T., Herzog, A., Pertsch, K., Gopalakrishnan, K., Ibarz, J., Nachum, O., Sontakke, S.A., Salazar, G., Tran, H.T., Peralta, J., Tan, C., Manjunath, D., Singh, J., Zitkovich, B., Jackson, T., Rao, K., Finn, C. & Levine, S.. (2023). Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:3909-3928 Available from https://proceedings.mlr.press/v229/chebotar23a.html.

Related Material