Energy-based Predictive Representations for Partially Observed Reinforcement Learning

Tianjun Zhang, Tongzheng Ren, Chenjun Xiao, Wenli Xiao, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2477-2487, 2023.

Abstract

In real-world applications, handling partial observability is a common requirement for reinforcement learning algorithms, which is not captured by a Markov decision process (MDP). Although partially observable Markov decision processes (POMDPs) have been specifically designed to address this requirement, they present significant computational and statistical challenges in learning and planning. In this work, we introduce the Energy-based Predictive Representation (EPR) to provide a unified approach for designing practical reinforcement learning algorithms in both the MDP and POMDP settings. This framework enables coherent handling of learning, exploration, and planning tasks. The proposed framework leverages a powerful neural energy-based model to extract an adequate representation, allowing for efficient approximation of Q-functions. This representation facilitates the efficient computation of confidence, enabling the implementation of optimism or pessimism in planning when faced with uncertainty. Consequently, it effectively manages the trade-off between exploration and exploitation. Experimental investigations demonstrate that the proposed algorithm achieves state-of-the-art performance in both MDP and POMDP settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-zhang23b, title = {Energy-based Predictive Representations for Partially Observed Reinforcement Learning}, author = {Zhang, Tianjun and Ren, Tongzheng and Xiao, Chenjun and Xiao, Wenli and Gonzalez, Joseph E. and Schuurmans, Dale and Dai, Bo}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {2477--2487}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/zhang23b/zhang23b.pdf}, url = {https://proceedings.mlr.press/v216/zhang23b.html}, abstract = {In real-world applications, handling partial observability is a common requirement for reinforcement learning algorithms, which is not captured by a Markov decision process (MDP). Although partially observable Markov decision processes (POMDPs) have been specifically designed to address this requirement, they present significant computational and statistical challenges in learning and planning. In this work, we introduce the Energy-based Predictive Representation (EPR) to provide a unified approach for designing practical reinforcement learning algorithms in both the MDP and POMDP settings. This framework enables coherent handling of learning, exploration, and planning tasks. The proposed framework leverages a powerful neural energy-based model to extract an adequate representation, allowing for efficient approximation of Q-functions. This representation facilitates the efficient computation of confidence, enabling the implementation of optimism or pessimism in planning when faced with uncertainty. Consequently, it effectively manages the trade-off between exploration and exploitation. Experimental investigations demonstrate that the proposed algorithm achieves state-of-the-art performance in both MDP and POMDP settings.} }
Endnote
%0 Conference Paper %T Energy-based Predictive Representations for Partially Observed Reinforcement Learning %A Tianjun Zhang %A Tongzheng Ren %A Chenjun Xiao %A Wenli Xiao %A Joseph E. Gonzalez %A Dale Schuurmans %A Bo Dai %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-zhang23b %I PMLR %P 2477--2487 %U https://proceedings.mlr.press/v216/zhang23b.html %V 216 %X In real-world applications, handling partial observability is a common requirement for reinforcement learning algorithms, which is not captured by a Markov decision process (MDP). Although partially observable Markov decision processes (POMDPs) have been specifically designed to address this requirement, they present significant computational and statistical challenges in learning and planning. In this work, we introduce the Energy-based Predictive Representation (EPR) to provide a unified approach for designing practical reinforcement learning algorithms in both the MDP and POMDP settings. This framework enables coherent handling of learning, exploration, and planning tasks. The proposed framework leverages a powerful neural energy-based model to extract an adequate representation, allowing for efficient approximation of Q-functions. This representation facilitates the efficient computation of confidence, enabling the implementation of optimism or pessimism in planning when faced with uncertainty. Consequently, it effectively manages the trade-off between exploration and exploitation. Experimental investigations demonstrate that the proposed algorithm achieves state-of-the-art performance in both MDP and POMDP settings.
APA
Zhang, T., Ren, T., Xiao, C., Xiao, W., Gonzalez, J.E., Schuurmans, D. & Dai, B.. (2023). Energy-based Predictive Representations for Partially Observed Reinforcement Learning. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2477-2487 Available from https://proceedings.mlr.press/v216/zhang23b.html.

Related Material