HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation

Li Sun; Jiefeng Wu; Feng Chen; Ruizhe Liu; Yanchao Yang

HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation

Li Sun, Jiefeng Wu, Feng Chen, Ruizhe Liu, Yanchao Yang

Proceedings of The 9th Conference on Robot Learning, PMLR 305:4524-4544, 2025.

Abstract

Effective policy learning for robotic manipulation requires scene representations that selectively capture task-relevant environmental features. Current approaches typically employ task-agnostic representation extraction, failing to emulate the dynamic perceptual adaptation observed in human cognition. We present HyperTASR, a hypernetwork-driven framework that modulates scene representations based on both task objectives and the execution phase. Our architecture dynamically generates representation transformation parameters conditioned on task specifications and progression state, enabling representations to evolve contextually throughout task execution. This approach maintains architectural compatibility with existing policy learning frameworks while fundamentally reconfiguring how visual features are processed. Unlike methods that simply concatenate or fuse task embeddings with task-agnostic representations, HyperTASR establishes computational separation between task-contextual and state-dependent processing paths, enhancing learning efficiency and representational quality. Comprehensive evaluations in both simulation and real-world environments demonstrate substantial performance improvements across different representation paradigms. Most notably, HyperTASR elevates success rates by over 27% when applied to GNFactor and achieves unprecedented single-view performance exceeding 80% success with 3D Diffuser Actor. Through ablation studies and attention visualization, we confirm that our approach selectively prioritizes task-relevant scene information, closely mirroring human adaptive perception during manipulation tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-sun25c,
  title = 	 {HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation},
  author =       {Sun, Li and Wu, Jiefeng and Chen, Feng and Liu, Ruizhe and Yang, Yanchao},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {4524--4544},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/sun25c/sun25c.pdf},
  url = 	 {https://proceedings.mlr.press/v305/sun25c.html},
  abstract = 	 {Effective policy learning for robotic manipulation requires scene representations that selectively capture task-relevant environmental features. Current approaches typically employ task-agnostic representation extraction, failing to emulate the dynamic perceptual adaptation observed in human cognition. We present HyperTASR, a hypernetwork-driven framework that modulates scene representations based on both task objectives and the execution phase. Our architecture dynamically generates representation transformation parameters conditioned on task specifications and progression state, enabling representations to evolve contextually throughout task execution. This approach maintains architectural compatibility with existing policy learning frameworks while fundamentally reconfiguring how visual features are processed. Unlike methods that simply concatenate or fuse task embeddings with task-agnostic representations, HyperTASR establishes computational separation between task-contextual and state-dependent processing paths, enhancing learning efficiency and representational quality. Comprehensive evaluations in both simulation and real-world environments demonstrate substantial performance improvements across different representation paradigms. Most notably, HyperTASR elevates success rates by over 27% when applied to GNFactor and achieves unprecedented single-view performance exceeding 80% success with 3D Diffuser Actor. Through ablation studies and attention visualization, we confirm that our approach selectively prioritizes task-relevant scene information, closely mirroring human adaptive perception during manipulation tasks.}
}

Endnote

%0 Conference Paper
%T HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation
%A Li Sun
%A Jiefeng Wu
%A Feng Chen
%A Ruizhe Liu
%A Yanchao Yang
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-sun25c
%I PMLR
%P 4524--4544
%U https://proceedings.mlr.press/v305/sun25c.html
%V 305
%X Effective policy learning for robotic manipulation requires scene representations that selectively capture task-relevant environmental features. Current approaches typically employ task-agnostic representation extraction, failing to emulate the dynamic perceptual adaptation observed in human cognition. We present HyperTASR, a hypernetwork-driven framework that modulates scene representations based on both task objectives and the execution phase. Our architecture dynamically generates representation transformation parameters conditioned on task specifications and progression state, enabling representations to evolve contextually throughout task execution. This approach maintains architectural compatibility with existing policy learning frameworks while fundamentally reconfiguring how visual features are processed. Unlike methods that simply concatenate or fuse task embeddings with task-agnostic representations, HyperTASR establishes computational separation between task-contextual and state-dependent processing paths, enhancing learning efficiency and representational quality. Comprehensive evaluations in both simulation and real-world environments demonstrate substantial performance improvements across different representation paradigms. Most notably, HyperTASR elevates success rates by over 27% when applied to GNFactor and achieves unprecedented single-view performance exceeding 80% success with 3D Diffuser Actor. Through ablation studies and attention visualization, we confirm that our approach selectively prioritizes task-relevant scene information, closely mirroring human adaptive perception during manipulation tasks.

APA

Sun, L., Wu, J., Chen, F., Liu, R. & Yang, Y.. (2025). HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:4524-4544 Available from https://proceedings.mlr.press/v305/sun25c.html.

HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation

Abstract

Cite this Paper

Related Material