PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning

Dongchi Huang, Jiaqi Wang, Yang Li, Chunhe Xia, Tianle Zhang, Kaige Zhang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:25859-25875, 2025.

Abstract

Partial observability presents a significant challenge for Safe Reinforcement Learning (Safe RL), as it impedes the identification of potential risks and rewards. Leveraging specific types of privileged information during training to mitigate the effects of partial observability has yielded notable empirical successes. In this paper, we propose Asymmetric Constrained Partially Observable Markov Decision Processes (ACPOMDPs) to theoretically examine the advantages of incorporating privileged information in Safe RL. Building upon ACPOMDPs, we propose the Privileged Information Guided Dreamer (PIGDreamer), a model-based RL approach that leverages privileged information to enhance the agent’s safety and performance through privileged representation alignment and an asymmetric actor-critic structure. Our empirical results demonstrate that PIGDreamer significantly outperforms existing Safe RL methods. Furthermore, compared to alternative privileged RL methods, our approach exhibits enhanced performance, robustness, and efficiency. Codes are available at: https://github.com/hggforget/PIGDreamer.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-huang25ai, title = {{PIGD}reamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning}, author = {Huang, Dongchi and Wang, Jiaqi and Li, Yang and Xia, Chunhe and Zhang, Tianle and Zhang, Kaige}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {25859--25875}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/huang25ai/huang25ai.pdf}, url = {https://proceedings.mlr.press/v267/huang25ai.html}, abstract = {Partial observability presents a significant challenge for Safe Reinforcement Learning (Safe RL), as it impedes the identification of potential risks and rewards. Leveraging specific types of privileged information during training to mitigate the effects of partial observability has yielded notable empirical successes. In this paper, we propose Asymmetric Constrained Partially Observable Markov Decision Processes (ACPOMDPs) to theoretically examine the advantages of incorporating privileged information in Safe RL. Building upon ACPOMDPs, we propose the Privileged Information Guided Dreamer (PIGDreamer), a model-based RL approach that leverages privileged information to enhance the agent’s safety and performance through privileged representation alignment and an asymmetric actor-critic structure. Our empirical results demonstrate that PIGDreamer significantly outperforms existing Safe RL methods. Furthermore, compared to alternative privileged RL methods, our approach exhibits enhanced performance, robustness, and efficiency. Codes are available at: https://github.com/hggforget/PIGDreamer.} }
Endnote
%0 Conference Paper %T PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning %A Dongchi Huang %A Jiaqi Wang %A Yang Li %A Chunhe Xia %A Tianle Zhang %A Kaige Zhang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-huang25ai %I PMLR %P 25859--25875 %U https://proceedings.mlr.press/v267/huang25ai.html %V 267 %X Partial observability presents a significant challenge for Safe Reinforcement Learning (Safe RL), as it impedes the identification of potential risks and rewards. Leveraging specific types of privileged information during training to mitigate the effects of partial observability has yielded notable empirical successes. In this paper, we propose Asymmetric Constrained Partially Observable Markov Decision Processes (ACPOMDPs) to theoretically examine the advantages of incorporating privileged information in Safe RL. Building upon ACPOMDPs, we propose the Privileged Information Guided Dreamer (PIGDreamer), a model-based RL approach that leverages privileged information to enhance the agent’s safety and performance through privileged representation alignment and an asymmetric actor-critic structure. Our empirical results demonstrate that PIGDreamer significantly outperforms existing Safe RL methods. Furthermore, compared to alternative privileged RL methods, our approach exhibits enhanced performance, robustness, and efficiency. Codes are available at: https://github.com/hggforget/PIGDreamer.
APA
Huang, D., Wang, J., Li, Y., Xia, C., Zhang, T. & Zhang, K.. (2025). PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:25859-25875 Available from https://proceedings.mlr.press/v267/huang25ai.html.

Related Material