PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

Kuo-Hao Zeng; Zichen Zhang; Kiana Ehsani; Rose Hendrix; Jordi Salvador; Alvaro Herrasti; Ross Girshick; Aniruddha Kembhavi; Luca Weihs

PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs

Proceedings of The 8th Conference on Robot Learning, PMLR 270:408-432, 2025.

Abstract

We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses a foundational vision transformer encoder with a causal transformer decoder enabling long-term memory and reasoning. It is trained for hundreds of millions of interactions across diverse environments, leveraging parallelized, multi-machine rollouts for efficient training with high throughput. PoliFormer is a masterful navigator, producing state-of-the-art results across two distinct embodiments, the LoCoBot and Stretch RE-1 robots, and four navigation benchmarks. It breaks through the plateaus of previous work, achieving an unprecedented 85.5% success rate in object goal navigation on the CHORES-S benchmark, a 28.5% absolute improvement. PoliFormer can also be trivially extended to a variety of downstream applications such as object tracking, multi-object navigation, and open-vocabulary navigation with no finetuning.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-zeng25a,
  title = 	 {PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators},
  author =       {Zeng, Kuo-Hao and Zhang, Zichen and Ehsani, Kiana and Hendrix, Rose and Salvador, Jordi and Herrasti, Alvaro and Girshick, Ross and Kembhavi, Aniruddha and Weihs, Luca},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {408--432},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/zeng25a/zeng25a.pdf},
  url = 	 {https://proceedings.mlr.press/v270/zeng25a.html},
  abstract = 	 {We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses a foundational vision transformer encoder with a causal transformer decoder enabling long-term memory and reasoning. It is trained for hundreds of millions of interactions across diverse environments, leveraging parallelized, multi-machine rollouts for efficient training with high throughput. PoliFormer is a masterful navigator, producing state-of-the-art results across two distinct embodiments, the LoCoBot and Stretch RE-1 robots, and four navigation benchmarks. It breaks through the plateaus of previous work, achieving an unprecedented 85.5% success rate in object goal navigation on the CHORES-S benchmark, a 28.5% absolute improvement. PoliFormer can also be trivially extended to a variety of downstream applications such as object tracking, multi-object navigation, and open-vocabulary navigation with no finetuning.}
}

Endnote

%0 Conference Paper
%T PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
%A Kuo-Hao Zeng
%A Zichen Zhang
%A Kiana Ehsani
%A Rose Hendrix
%A Jordi Salvador
%A Alvaro Herrasti
%A Ross Girshick
%A Aniruddha Kembhavi
%A Luca Weihs
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-zeng25a
%I PMLR
%P 408--432
%U https://proceedings.mlr.press/v270/zeng25a.html
%V 270
%X We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses a foundational vision transformer encoder with a causal transformer decoder enabling long-term memory and reasoning. It is trained for hundreds of millions of interactions across diverse environments, leveraging parallelized, multi-machine rollouts for efficient training with high throughput. PoliFormer is a masterful navigator, producing state-of-the-art results across two distinct embodiments, the LoCoBot and Stretch RE-1 robots, and four navigation benchmarks. It breaks through the plateaus of previous work, achieving an unprecedented 85.5% success rate in object goal navigation on the CHORES-S benchmark, a 28.5% absolute improvement. PoliFormer can also be trivially extended to a variety of downstream applications such as object tracking, multi-object navigation, and open-vocabulary navigation with no finetuning.

APA

Zeng, K., Zhang, Z., Ehsani, K., Hendrix, R., Salvador, J., Herrasti, A., Girshick, R., Kembhavi, A. & Weihs, L.. (2025). PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:408-432 Available from https://proceedings.mlr.press/v270/zeng25a.html.

PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

Abstract

Cite this Paper

Related Material