Differentiable Robot Rendering

Ruoshi Liu; Alper Canberk; Shuran Song; Carl Vondrick

Differentiable Robot Rendering

Ruoshi Liu, Alper Canberk, Shuran Song, Carl Vondrick

Proceedings of The 8th Conference on Robot Learning, PMLR 270:117-129, 2025.

Abstract

Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings. A key challenge in applying them to robotic tasks is the modality gap between visual data and action data. We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters. Our model integrates a kinematics-aware deformable model and Gaussians Splatting and is compatible with any robot form factors and degrees of freedom. We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models. Quantitative and qualitative results show that our differentiable rendering model provides effective gradients for robotic control directly from pixels, setting the foundation for the future applications of vision foundation models in robotics.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-liu25a,
  title = 	 {Differentiable Robot Rendering},
  author =       {Liu, Ruoshi and Canberk, Alper and Song, Shuran and Vondrick, Carl},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {117--129},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/liu25a/liu25a.pdf},
  url = 	 {https://proceedings.mlr.press/v270/liu25a.html},
  abstract = 	 {Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings. A key challenge in applying them to robotic tasks is the modality gap between visual data and action data. We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters. Our model integrates a kinematics-aware deformable model and Gaussians Splatting and is compatible with any robot form factors and degrees of freedom. We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models. Quantitative and qualitative results show that our differentiable rendering model provides effective gradients for robotic control directly from pixels, setting the foundation for the future applications of vision foundation models in robotics.}
}

Endnote

%0 Conference Paper
%T Differentiable Robot Rendering
%A Ruoshi Liu
%A Alper Canberk
%A Shuran Song
%A Carl Vondrick
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-liu25a
%I PMLR
%P 117--129
%U https://proceedings.mlr.press/v270/liu25a.html
%V 270
%X Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings. A key challenge in applying them to robotic tasks is the modality gap between visual data and action data. We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters. Our model integrates a kinematics-aware deformable model and Gaussians Splatting and is compatible with any robot form factors and degrees of freedom. We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models. Quantitative and qualitative results show that our differentiable rendering model provides effective gradients for robotic control directly from pixels, setting the foundation for the future applications of vision foundation models in robotics.

APA

Liu, R., Canberk, A., Song, S. & Vondrick, C.. (2025). Differentiable Robot Rendering. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:117-129 Available from https://proceedings.mlr.press/v270/liu25a.html.

Differentiable Robot Rendering

Abstract

Cite this Paper

Related Material