Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:59264-59288, 2024.

Abstract

In real-world scenarios, the application of reinforcement learning is significantly challenged by complex non-stationarity. Most existing methods attempt to model changes in the environment explicitly, often requiring impractical prior knowledge of environments. In this paper, we propose a new perspective, positing that non-stationarity can propagate and accumulate through complex causal relationships during state transitions, thereby compounding its sophistication and affecting policy learning. We believe that this challenge can be more effectively addressed by implicitly tracing the causal origin of non-stationarity. To this end, we introduce the Causal-Origin REPresentation (COREP) algorithm. COREP primarily employs a guided updating mechanism to learn a stable graph representation for the state, termed as causal-origin representation. By leveraging this representation, the learned policy exhibits impressive resilience to non-stationarity. We supplement our approach with a theoretical analysis grounded in the causal interpretation for non-stationary reinforcement learning, advocating for the validity of the causal-origin representation. Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity problems. The code is available at https://github.com/PKU-RL/COREP.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhang24ah, title = {Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation}, author = {Zhang, Wanpeng and Li, Yilin and Yang, Boyu and Lu, Zongqing}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {59264--59288}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhang24ah/zhang24ah.pdf}, url = {https://proceedings.mlr.press/v235/zhang24ah.html}, abstract = {In real-world scenarios, the application of reinforcement learning is significantly challenged by complex non-stationarity. Most existing methods attempt to model changes in the environment explicitly, often requiring impractical prior knowledge of environments. In this paper, we propose a new perspective, positing that non-stationarity can propagate and accumulate through complex causal relationships during state transitions, thereby compounding its sophistication and affecting policy learning. We believe that this challenge can be more effectively addressed by implicitly tracing the causal origin of non-stationarity. To this end, we introduce the Causal-Origin REPresentation (COREP) algorithm. COREP primarily employs a guided updating mechanism to learn a stable graph representation for the state, termed as causal-origin representation. By leveraging this representation, the learned policy exhibits impressive resilience to non-stationarity. We supplement our approach with a theoretical analysis grounded in the causal interpretation for non-stationary reinforcement learning, advocating for the validity of the causal-origin representation. Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity problems. The code is available at https://github.com/PKU-RL/COREP.} }
Endnote
%0 Conference Paper %T Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation %A Wanpeng Zhang %A Yilin Li %A Boyu Yang %A Zongqing Lu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhang24ah %I PMLR %P 59264--59288 %U https://proceedings.mlr.press/v235/zhang24ah.html %V 235 %X In real-world scenarios, the application of reinforcement learning is significantly challenged by complex non-stationarity. Most existing methods attempt to model changes in the environment explicitly, often requiring impractical prior knowledge of environments. In this paper, we propose a new perspective, positing that non-stationarity can propagate and accumulate through complex causal relationships during state transitions, thereby compounding its sophistication and affecting policy learning. We believe that this challenge can be more effectively addressed by implicitly tracing the causal origin of non-stationarity. To this end, we introduce the Causal-Origin REPresentation (COREP) algorithm. COREP primarily employs a guided updating mechanism to learn a stable graph representation for the state, termed as causal-origin representation. By leveraging this representation, the learned policy exhibits impressive resilience to non-stationarity. We supplement our approach with a theoretical analysis grounded in the causal interpretation for non-stationary reinforcement learning, advocating for the validity of the causal-origin representation. Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity problems. The code is available at https://github.com/PKU-RL/COREP.
APA
Zhang, W., Li, Y., Yang, B. & Lu, Z.. (2024). Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:59264-59288 Available from https://proceedings.mlr.press/v235/zhang24ah.html.

Related Material