Modeling Partially Observable Systems using Graph-Based Memory and Topological Priors

Steven Morad, Stephan Liwicki, Ryan Kortvelesy, Roberto Mecca, Amanda Prorok
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:59-73, 2022.

Abstract

Solving partially observable Markov decision processes (POMDPs) is critical when applying reinforcement learning to real-world problems, where agents have an incomplete view of the world. Recurrent neural networks (RNNs) are the defacto approach for solving POMDPs in reinforcement learning (RL). Although they perform well in supervised learning, noisy gradients reduce their capabilities in RL. Furthermore, they cannot utilize prior human knowledge to bootstrap or stabilize learning. This leads researchers to hand-design task-specific memory models based on their prior knowledge of the task at hand. In this paper, we present graph convolutional memory (GCM), the first RL memory framework with swappable task-specific priors, enabling users to inject expertise into their models. GCM uses human-defined topological priors to form graph neighborhoods, combining them into a larger network topology. We query the graph using graph convolution, coalescing relevant memories into a context-dependent summary of the past. Results demonstrate that GCM outperforms state of the art methods on control, memorization, and navigation tasks while using fewer parameters.

Cite this Paper


BibTeX
@InProceedings{pmlr-v168-morad22a, title = {Modeling Partially Observable Systems using Graph-Based Memory and Topological Priors}, author = {Morad, Steven and Liwicki, Stephan and Kortvelesy, Ryan and Mecca, Roberto and Prorok, Amanda}, booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference}, pages = {59--73}, year = {2022}, editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel}, volume = {168}, series = {Proceedings of Machine Learning Research}, month = {23--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v168/morad22a/morad22a.pdf}, url = {https://proceedings.mlr.press/v168/morad22a.html}, abstract = {Solving partially observable Markov decision processes (POMDPs) is critical when applying reinforcement learning to real-world problems, where agents have an incomplete view of the world. Recurrent neural networks (RNNs) are the defacto approach for solving POMDPs in reinforcement learning (RL). Although they perform well in supervised learning, noisy gradients reduce their capabilities in RL. Furthermore, they cannot utilize prior human knowledge to bootstrap or stabilize learning. This leads researchers to hand-design task-specific memory models based on their prior knowledge of the task at hand. In this paper, we present graph convolutional memory (GCM), the first RL memory framework with swappable task-specific priors, enabling users to inject expertise into their models. GCM uses human-defined topological priors to form graph neighborhoods, combining them into a larger network topology. We query the graph using graph convolution, coalescing relevant memories into a context-dependent summary of the past. Results demonstrate that GCM outperforms state of the art methods on control, memorization, and navigation tasks while using fewer parameters.} }
Endnote
%0 Conference Paper %T Modeling Partially Observable Systems using Graph-Based Memory and Topological Priors %A Steven Morad %A Stephan Liwicki %A Ryan Kortvelesy %A Roberto Mecca %A Amanda Prorok %B Proceedings of The 4th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2022 %E Roya Firoozi %E Negar Mehr %E Esen Yel %E Rika Antonova %E Jeannette Bohg %E Mac Schwager %E Mykel Kochenderfer %F pmlr-v168-morad22a %I PMLR %P 59--73 %U https://proceedings.mlr.press/v168/morad22a.html %V 168 %X Solving partially observable Markov decision processes (POMDPs) is critical when applying reinforcement learning to real-world problems, where agents have an incomplete view of the world. Recurrent neural networks (RNNs) are the defacto approach for solving POMDPs in reinforcement learning (RL). Although they perform well in supervised learning, noisy gradients reduce their capabilities in RL. Furthermore, they cannot utilize prior human knowledge to bootstrap or stabilize learning. This leads researchers to hand-design task-specific memory models based on their prior knowledge of the task at hand. In this paper, we present graph convolutional memory (GCM), the first RL memory framework with swappable task-specific priors, enabling users to inject expertise into their models. GCM uses human-defined topological priors to form graph neighborhoods, combining them into a larger network topology. We query the graph using graph convolution, coalescing relevant memories into a context-dependent summary of the past. Results demonstrate that GCM outperforms state of the art methods on control, memorization, and navigation tasks while using fewer parameters.
APA
Morad, S., Liwicki, S., Kortvelesy, R., Mecca, R. & Prorok, A.. (2022). Modeling Partially Observable Systems using Graph-Based Memory and Topological Priors. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:59-73 Available from https://proceedings.mlr.press/v168/morad22a.html.

Related Material