View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

Stephen Tian, Blake Wulfe, Kyle Sargent, Katherine Liu, Sergey Zakharov, Vitor Campagnolo Guizilini, Jiajun Wu
Proceedings of The 8th Conference on Robot Learning, PMLR 270:1173-1193, 2025.

Abstract

Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. Specifically, we study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints given a single input image. For practical application to diverse robotic data, these models must operate *zero-shot*, performing view synthesis on unseen tasks and environments. We empirically analyze view synthesis models within a simple data-augmentation scheme that we call View Synthesis Augmentation (VISTA) to understand their capabilities for learning viewpoint-invariant policies from single-viewpoint demonstration data. Upon evaluating the robustness of policies trained with our method to out-of-distribution camera viewpoints, we find that they outperform baselines in both simulated and real-world manipulation tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-tian25a, title = {View-Invariant Policy Learning via Zero-Shot Novel View Synthesis}, author = {Tian, Stephen and Wulfe, Blake and Sargent, Kyle and Liu, Katherine and Zakharov, Sergey and Guizilini, Vitor Campagnolo and Wu, Jiajun}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {1173--1193}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/tian25a/tian25a.pdf}, url = {https://proceedings.mlr.press/v270/tian25a.html}, abstract = {Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. Specifically, we study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints given a single input image. For practical application to diverse robotic data, these models must operate *zero-shot*, performing view synthesis on unseen tasks and environments. We empirically analyze view synthesis models within a simple data-augmentation scheme that we call View Synthesis Augmentation (VISTA) to understand their capabilities for learning viewpoint-invariant policies from single-viewpoint demonstration data. Upon evaluating the robustness of policies trained with our method to out-of-distribution camera viewpoints, we find that they outperform baselines in both simulated and real-world manipulation tasks.} }
Endnote
%0 Conference Paper %T View-Invariant Policy Learning via Zero-Shot Novel View Synthesis %A Stephen Tian %A Blake Wulfe %A Kyle Sargent %A Katherine Liu %A Sergey Zakharov %A Vitor Campagnolo Guizilini %A Jiajun Wu %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-tian25a %I PMLR %P 1173--1193 %U https://proceedings.mlr.press/v270/tian25a.html %V 270 %X Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. Specifically, we study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints given a single input image. For practical application to diverse robotic data, these models must operate *zero-shot*, performing view synthesis on unseen tasks and environments. We empirically analyze view synthesis models within a simple data-augmentation scheme that we call View Synthesis Augmentation (VISTA) to understand their capabilities for learning viewpoint-invariant policies from single-viewpoint demonstration data. Upon evaluating the robustness of policies trained with our method to out-of-distribution camera viewpoints, we find that they outperform baselines in both simulated and real-world manipulation tasks.
APA
Tian, S., Wulfe, B., Sargent, K., Liu, K., Zakharov, S., Guizilini, V.C. & Wu, J.. (2025). View-Invariant Policy Learning via Zero-Shot Novel View Synthesis. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:1173-1193 Available from https://proceedings.mlr.press/v270/tian25a.html.

Related Material