RobustZero: Enhancing MuZero Reinforcement Learning Robustness to State Perturbations

Yushuai Li, Hengyu Liu, Torben Bach Pedersen, Yuqiang He, Kim Guldstrand Larsen, Lu Chen, Christian S. Jensen, Jiachen Xu, Tianyi Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:35366-35389, 2025.

Abstract

The MuZero reinforcement learning method has achieved superhuman performance at games, and advances that enable MuZero to contend with complex actions now enable use of MuZero-class methods in real-world decision-making applications. However, some real-world applications are susceptible to state perturbations caused by malicious attacks and noisy sensors. To enhance the robustness of MuZero-class methods to state perturbations, we propose RobustZero, the first MuZero-class method that is $\underline{robust}$ to worst-case and random-case state perturbations, with $\underline{zero}$ prior knowledge of the environment’s dynamics. We present a training framework for RobustZero that features a self-supervised representation network, targeting the generation of a consistent initial hidden state, which is key to obtain consistent policies before and after state perturbations, and it features a unique loss function that facilitates robustness. We present an adaptive adjustment mechanism to enable model update, enhancing robustness to both worst-case and random-case state perturbations. Experiments on two classical control environments, three energy system environments, three transportation environments, and four Mujoco environments demonstrate that RobustZero can outperform state-of-the-art methods at defending against state perturbations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-li25bf, title = {{R}obust{Z}ero: Enhancing {M}u{Z}ero Reinforcement Learning Robustness to State Perturbations}, author = {Li, Yushuai and Liu, Hengyu and Pedersen, Torben Bach and He, Yuqiang and Larsen, Kim Guldstrand and Chen, Lu and Jensen, Christian S. and Xu, Jiachen and Li, Tianyi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {35366--35389}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/li25bf/li25bf.pdf}, url = {https://proceedings.mlr.press/v267/li25bf.html}, abstract = {The MuZero reinforcement learning method has achieved superhuman performance at games, and advances that enable MuZero to contend with complex actions now enable use of MuZero-class methods in real-world decision-making applications. However, some real-world applications are susceptible to state perturbations caused by malicious attacks and noisy sensors. To enhance the robustness of MuZero-class methods to state perturbations, we propose RobustZero, the first MuZero-class method that is $\underline{robust}$ to worst-case and random-case state perturbations, with $\underline{zero}$ prior knowledge of the environment’s dynamics. We present a training framework for RobustZero that features a self-supervised representation network, targeting the generation of a consistent initial hidden state, which is key to obtain consistent policies before and after state perturbations, and it features a unique loss function that facilitates robustness. We present an adaptive adjustment mechanism to enable model update, enhancing robustness to both worst-case and random-case state perturbations. Experiments on two classical control environments, three energy system environments, three transportation environments, and four Mujoco environments demonstrate that RobustZero can outperform state-of-the-art methods at defending against state perturbations.} }
Endnote
%0 Conference Paper %T RobustZero: Enhancing MuZero Reinforcement Learning Robustness to State Perturbations %A Yushuai Li %A Hengyu Liu %A Torben Bach Pedersen %A Yuqiang He %A Kim Guldstrand Larsen %A Lu Chen %A Christian S. Jensen %A Jiachen Xu %A Tianyi Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-li25bf %I PMLR %P 35366--35389 %U https://proceedings.mlr.press/v267/li25bf.html %V 267 %X The MuZero reinforcement learning method has achieved superhuman performance at games, and advances that enable MuZero to contend with complex actions now enable use of MuZero-class methods in real-world decision-making applications. However, some real-world applications are susceptible to state perturbations caused by malicious attacks and noisy sensors. To enhance the robustness of MuZero-class methods to state perturbations, we propose RobustZero, the first MuZero-class method that is $\underline{robust}$ to worst-case and random-case state perturbations, with $\underline{zero}$ prior knowledge of the environment’s dynamics. We present a training framework for RobustZero that features a self-supervised representation network, targeting the generation of a consistent initial hidden state, which is key to obtain consistent policies before and after state perturbations, and it features a unique loss function that facilitates robustness. We present an adaptive adjustment mechanism to enable model update, enhancing robustness to both worst-case and random-case state perturbations. Experiments on two classical control environments, three energy system environments, three transportation environments, and four Mujoco environments demonstrate that RobustZero can outperform state-of-the-art methods at defending against state perturbations.
APA
Li, Y., Liu, H., Pedersen, T.B., He, Y., Larsen, K.G., Chen, L., Jensen, C.S., Xu, J. & Li, T.. (2025). RobustZero: Enhancing MuZero Reinforcement Learning Robustness to State Perturbations. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:35366-35389 Available from https://proceedings.mlr.press/v267/li25bf.html.

Related Material