Body Transformer: Leveraging Robot Embodiment for Policy Learning

Carmelo Sferrazza; Dun-Ming Huang; Fangchen Liu; Jongmin Lee; Pieter Abbeel

Body Transformer: Leveraging Robot Embodiment for Policy Learning

Carmelo Sferrazza, Dun-Ming Huang, Fangchen Liu, Jongmin Lee, Pieter Abbeel

Proceedings of The 8th Conference on Robot Learning, PMLR 270:3407-3424, 2025.

Abstract

In recent years, the transformer architecture has become the de-facto standard for machine learning algorithms applied to natural language processing and computer vision. Despite notable evidence of successful deployment of this architecture in the context of robot learning, we claim that vanilla transformers do not fully exploit the structure of the robot learning problem. We propose Body Transformer (BoT), an architecture that exploits the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information through the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, with respect to task completion, scaling properties, and computational efficiency when representing either imitation or reinforcement learning policies.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-sferrazza25a,
  title = 	 {Body Transformer: Leveraging Robot Embodiment for Policy Learning},
  author =       {Sferrazza, Carmelo and Huang, Dun-Ming and Liu, Fangchen and Lee, Jongmin and Abbeel, Pieter},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {3407--3424},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/sferrazza25a/sferrazza25a.pdf},
  url = 	 {https://proceedings.mlr.press/v270/sferrazza25a.html},
  abstract = 	 {In recent years, the transformer architecture has become the de-facto standard for machine learning algorithms applied to natural language processing and computer vision. Despite notable evidence of successful deployment of this architecture in the context of robot learning, we claim that vanilla transformers do not fully exploit the structure of the robot learning problem. We propose Body Transformer (BoT), an architecture that exploits the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information through the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, with respect to task completion, scaling properties, and computational efficiency when representing either imitation or reinforcement learning policies.}
}

Endnote

%0 Conference Paper
%T Body Transformer: Leveraging Robot Embodiment for Policy Learning
%A Carmelo Sferrazza
%A Dun-Ming Huang
%A Fangchen Liu
%A Jongmin Lee
%A Pieter Abbeel
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-sferrazza25a
%I PMLR
%P 3407--3424
%U https://proceedings.mlr.press/v270/sferrazza25a.html
%V 270
%X In recent years, the transformer architecture has become the de-facto standard for machine learning algorithms applied to natural language processing and computer vision. Despite notable evidence of successful deployment of this architecture in the context of robot learning, we claim that vanilla transformers do not fully exploit the structure of the robot learning problem. We propose Body Transformer (BoT), an architecture that exploits the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information through the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, with respect to task completion, scaling properties, and computational efficiency when representing either imitation or reinforcement learning policies.

APA

Sferrazza, C., Huang, D., Liu, F., Lee, J. & Abbeel, P.. (2025). Body Transformer: Leveraging Robot Embodiment for Policy Learning. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:3407-3424 Available from https://proceedings.mlr.press/v270/sferrazza25a.html.

Body Transformer: Leveraging Robot Embodiment for Policy Learning

Abstract

Cite this Paper

Related Material