[edit]
RobustZero: Enhancing MuZero Reinforcement Learning Robustness to State Perturbations
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:35366-35389, 2025.
Abstract
The MuZero reinforcement learning method has achieved superhuman performance at games, and advances that enable MuZero to contend with complex actions now enable use of MuZero-class methods in real-world decision-making applications. However, some real-world applications are susceptible to state perturbations caused by malicious attacks and noisy sensors. To enhance the robustness of MuZero-class methods to state perturbations, we propose RobustZero, the first MuZero-class method that is $\underline{robust}$ to worst-case and random-case state perturbations, with $\underline{zero}$ prior knowledge of the environment’s dynamics. We present a training framework for RobustZero that features a self-supervised representation network, targeting the generation of a consistent initial hidden state, which is key to obtain consistent policies before and after state perturbations, and it features a unique loss function that facilitates robustness. We present an adaptive adjustment mechanism to enable model update, enhancing robustness to both worst-case and random-case state perturbations. Experiments on two classical control environments, three energy system environments, three transportation environments, and four Mujoco environments demonstrate that RobustZero can outperform state-of-the-art methods at defending against state perturbations.