Rethinking Transformers in Solving POMDPs

Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon Shaolei Du, Huazhe Xu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:33089-33112, 2024.

Abstract

Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers struggle to model, are reducible to POMDPs. This poses a significant challenge for Transformers in learning POMDP-specific inductive biases, due to their lack of inherent recurrence found in other models like RNNs. This paper casts doubt on the prevalent belief in Transformers as sequence models for RL and proposes to introduce a point-wise recurrent structure. The Deep Linear Recurrent Unit (LRU) emerges as a well-suited alternative for Partially Observable RL, with empirical results highlighting the sub-optimal performance of the Transformer and considerable strength of LRU.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-lu24h, title = {Rethinking Transformers in Solving {POMDP}s}, author = {Lu, Chenhao and Shi, Ruizhe and Liu, Yuyao and Hu, Kaizhe and Du, Simon Shaolei and Xu, Huazhe}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {33089--33112}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/lu24h/lu24h.pdf}, url = {https://proceedings.mlr.press/v235/lu24h.html}, abstract = {Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers struggle to model, are reducible to POMDPs. This poses a significant challenge for Transformers in learning POMDP-specific inductive biases, due to their lack of inherent recurrence found in other models like RNNs. This paper casts doubt on the prevalent belief in Transformers as sequence models for RL and proposes to introduce a point-wise recurrent structure. The Deep Linear Recurrent Unit (LRU) emerges as a well-suited alternative for Partially Observable RL, with empirical results highlighting the sub-optimal performance of the Transformer and considerable strength of LRU.} }
Endnote
%0 Conference Paper %T Rethinking Transformers in Solving POMDPs %A Chenhao Lu %A Ruizhe Shi %A Yuyao Liu %A Kaizhe Hu %A Simon Shaolei Du %A Huazhe Xu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-lu24h %I PMLR %P 33089--33112 %U https://proceedings.mlr.press/v235/lu24h.html %V 235 %X Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers struggle to model, are reducible to POMDPs. This poses a significant challenge for Transformers in learning POMDP-specific inductive biases, due to their lack of inherent recurrence found in other models like RNNs. This paper casts doubt on the prevalent belief in Transformers as sequence models for RL and proposes to introduce a point-wise recurrent structure. The Deep Linear Recurrent Unit (LRU) emerges as a well-suited alternative for Partially Observable RL, with empirical results highlighting the sub-optimal performance of the Transformer and considerable strength of LRU.
APA
Lu, C., Shi, R., Liu, Y., Hu, K., Du, S.S. & Xu, H.. (2024). Rethinking Transformers in Solving POMDPs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:33089-33112 Available from https://proceedings.mlr.press/v235/lu24h.html.

Related Material