Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

Abdulaziz Almuzairee, Rohan Prashant Patil, Dwait Bhatt, Henrik I Christensen
Proceedings of The 9th Conference on Robot Learning, PMLR 305:981-1003, 2025.

Abstract

Vision is well-known for its use in manipulation, especially using visual servoing. To make it robust, multiple cameras are needed to expand the field of view. That is computationally challenging. Merging multiple views and using Q-learning allows the design of more effective representations and optimization of sample efficiency. Such a solution might be expensive to deploy. To mitigate this, we introduce a merge and disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while augmenting with single-view features to allow lightweight deployment and ensure robust policies. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-almuzairee25a, title = {Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation}, author = {Almuzairee, Abdulaziz and Patil, Rohan Prashant and Bhatt, Dwait and Christensen, Henrik I}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {981--1003}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/almuzairee25a/almuzairee25a.pdf}, url = {https://proceedings.mlr.press/v305/almuzairee25a.html}, abstract = {Vision is well-known for its use in manipulation, especially using visual servoing. To make it robust, multiple cameras are needed to expand the field of view. That is computationally challenging. Merging multiple views and using Q-learning allows the design of more effective representations and optimization of sample efficiency. Such a solution might be expensive to deploy. To mitigate this, we introduce a merge and disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while augmenting with single-view features to allow lightweight deployment and ensure robust policies. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3.} }
Endnote
%0 Conference Paper %T Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation %A Abdulaziz Almuzairee %A Rohan Prashant Patil %A Dwait Bhatt %A Henrik I Christensen %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-almuzairee25a %I PMLR %P 981--1003 %U https://proceedings.mlr.press/v305/almuzairee25a.html %V 305 %X Vision is well-known for its use in manipulation, especially using visual servoing. To make it robust, multiple cameras are needed to expand the field of view. That is computationally challenging. Merging multiple views and using Q-learning allows the design of more effective representations and optimization of sample efficiency. Such a solution might be expensive to deploy. To mitigate this, we introduce a merge and disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while augmenting with single-view features to allow lightweight deployment and ensure robust policies. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3.
APA
Almuzairee, A., Patil, R.P., Bhatt, D. & Christensen, H.I.. (2025). Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:981-1003 Available from https://proceedings.mlr.press/v305/almuzairee25a.html.

Related Material