Learning Belief Representations for Partially Observable Deep RL

Andrew Wang, Andrew C Li, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. Mcilraith
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:35970-35988, 2023.

Abstract

Many important real-world Reinforcement Learning (RL) problems involve partial observability and require policies with memory. Unfortunately, standard deep RL algorithms for partially observable settings typically condition on the full history of interactions and are notoriously difficult to train. We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. Our approach simplifies policy learning by leveraging state information at training time, that may not be available at deployment time. We do so in two ways: first, we decouple belief state modelling (via unsupervised learning) from policy optimization (via RL); and second, we propose a representation learning approach to capture a compact set of reward-relevant features of the state. Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-wang23p, title = {Learning Belief Representations for Partially Observable Deep {RL}}, author = {Wang, Andrew and Li, Andrew C and Klassen, Toryn Q. and Icarte, Rodrigo Toro and Mcilraith, Sheila A.}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {35970--35988}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/wang23p/wang23p.pdf}, url = {https://proceedings.mlr.press/v202/wang23p.html}, abstract = {Many important real-world Reinforcement Learning (RL) problems involve partial observability and require policies with memory. Unfortunately, standard deep RL algorithms for partially observable settings typically condition on the full history of interactions and are notoriously difficult to train. We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. Our approach simplifies policy learning by leveraging state information at training time, that may not be available at deployment time. We do so in two ways: first, we decouple belief state modelling (via unsupervised learning) from policy optimization (via RL); and second, we propose a representation learning approach to capture a compact set of reward-relevant features of the state. Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.} }
Endnote
%0 Conference Paper %T Learning Belief Representations for Partially Observable Deep RL %A Andrew Wang %A Andrew C Li %A Toryn Q. Klassen %A Rodrigo Toro Icarte %A Sheila A. Mcilraith %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-wang23p %I PMLR %P 35970--35988 %U https://proceedings.mlr.press/v202/wang23p.html %V 202 %X Many important real-world Reinforcement Learning (RL) problems involve partial observability and require policies with memory. Unfortunately, standard deep RL algorithms for partially observable settings typically condition on the full history of interactions and are notoriously difficult to train. We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. Our approach simplifies policy learning by leveraging state information at training time, that may not be available at deployment time. We do so in two ways: first, we decouple belief state modelling (via unsupervised learning) from policy optimization (via RL); and second, we propose a representation learning approach to capture a compact set of reward-relevant features of the state. Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.
APA
Wang, A., Li, A.C., Klassen, T.Q., Icarte, R.T. & Mcilraith, S.A.. (2023). Learning Belief Representations for Partially Observable Deep RL. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:35970-35988 Available from https://proceedings.mlr.press/v202/wang23p.html.

Related Material