MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning

Suning Huang, Zheyu Aqa Zhang, Tianhai Liang, Yihan Xu, Zhehao Kou, Chenhao Lu, Guowei Xu, Zhengrong Xue, Huazhe Xu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:26143-26161, 2025.

Abstract

Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-huang25av, title = {{MENTOR}: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning}, author = {Huang, Suning and Zhang, Zheyu Aqa and Liang, Tianhai and Xu, Yihan and Kou, Zhehao and Lu, Chenhao and Xu, Guowei and Xue, Zhengrong and Xu, Huazhe}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {26143--26161}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/huang25av/huang25av.pdf}, url = {https://proceedings.mlr.press/v267/huang25av.html}, abstract = {Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.} }
Endnote
%0 Conference Paper %T MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning %A Suning Huang %A Zheyu Aqa Zhang %A Tianhai Liang %A Yihan Xu %A Zhehao Kou %A Chenhao Lu %A Guowei Xu %A Zhengrong Xue %A Huazhe Xu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-huang25av %I PMLR %P 26143--26161 %U https://proceedings.mlr.press/v267/huang25av.html %V 267 %X Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.
APA
Huang, S., Zhang, Z.A., Liang, T., Xu, Y., Kou, Z., Lu, C., Xu, G., Xue, Z. & Xu, H.. (2025). MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:26143-26161 Available from https://proceedings.mlr.press/v267/huang25av.html.

Related Material