Discovering Robotic Interaction Modes with Discrete Representation Learning

Liquan Wang, Ankit Goyal, Haoping Xu, Animesh Garg
Proceedings of The 8th Conference on Robot Learning, PMLR 270:830-863, 2025.

Abstract

Abstract: Human actions manipulating articulated objects, such as opening and closing a drawer, can be categorized into multiple modalities we define as interaction modes. Traditional robot learning approaches lack discrete representations of these modes, which are crucial for empirical sampling and grounding. In this paper, we present ActAIM2, which learns a discrete representation of robot manipulation interaction modes in a purely unsupervised fashion, without the use of expert labels or simulator-based privileged information. Utilizing novel data collection methods involving simulator rollouts, ActAIM2 consists of an interaction mode selector and a low-level action predictor. The selector generates discrete representations of potential interaction modes with self-supervision, while the predictor outputs corresponding action trajectories. Our method is validated through its success rate in manipulating articulated objects and its robustness in sampling meaningful actions from the discrete representation. Extensive experiments demonstrate ActAIM2’s effectiveness in enhancing manipulability and generalizability over baselines and ablation studies. For videos and additional results, see our website: https://actaim2.github.io/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-wang25d, title = {Discovering Robotic Interaction Modes with Discrete Representation Learning}, author = {Wang, Liquan and Goyal, Ankit and Xu, Haoping and Garg, Animesh}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {830--863}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/wang25d/wang25d.pdf}, url = {https://proceedings.mlr.press/v270/wang25d.html}, abstract = {Abstract: Human actions manipulating articulated objects, such as opening and closing a drawer, can be categorized into multiple modalities we define as interaction modes. Traditional robot learning approaches lack discrete representations of these modes, which are crucial for empirical sampling and grounding. In this paper, we present ActAIM2, which learns a discrete representation of robot manipulation interaction modes in a purely unsupervised fashion, without the use of expert labels or simulator-based privileged information. Utilizing novel data collection methods involving simulator rollouts, ActAIM2 consists of an interaction mode selector and a low-level action predictor. The selector generates discrete representations of potential interaction modes with self-supervision, while the predictor outputs corresponding action trajectories. Our method is validated through its success rate in manipulating articulated objects and its robustness in sampling meaningful actions from the discrete representation. Extensive experiments demonstrate ActAIM2’s effectiveness in enhancing manipulability and generalizability over baselines and ablation studies. For videos and additional results, see our website: https://actaim2.github.io/.} }
Endnote
%0 Conference Paper %T Discovering Robotic Interaction Modes with Discrete Representation Learning %A Liquan Wang %A Ankit Goyal %A Haoping Xu %A Animesh Garg %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-wang25d %I PMLR %P 830--863 %U https://proceedings.mlr.press/v270/wang25d.html %V 270 %X Abstract: Human actions manipulating articulated objects, such as opening and closing a drawer, can be categorized into multiple modalities we define as interaction modes. Traditional robot learning approaches lack discrete representations of these modes, which are crucial for empirical sampling and grounding. In this paper, we present ActAIM2, which learns a discrete representation of robot manipulation interaction modes in a purely unsupervised fashion, without the use of expert labels or simulator-based privileged information. Utilizing novel data collection methods involving simulator rollouts, ActAIM2 consists of an interaction mode selector and a low-level action predictor. The selector generates discrete representations of potential interaction modes with self-supervision, while the predictor outputs corresponding action trajectories. Our method is validated through its success rate in manipulating articulated objects and its robustness in sampling meaningful actions from the discrete representation. Extensive experiments demonstrate ActAIM2’s effectiveness in enhancing manipulability and generalizability over baselines and ablation studies. For videos and additional results, see our website: https://actaim2.github.io/.
APA
Wang, L., Goyal, A., Xu, H. & Garg, A.. (2025). Discovering Robotic Interaction Modes with Discrete Representation Learning. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:830-863 Available from https://proceedings.mlr.press/v270/wang25d.html.

Related Material