Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment

Jan Corazza, Hadi Partovi Aria, Daniel Neider, Zhe Xu
Proceedings of the Third Conference on Causal Learning and Reasoning, PMLR 236:643-664, 2024.

Abstract

Reinforcement learning (RL) algorithms struggle with learning optimal policies for tasks where reward feedback is sparse and depends on a complex sequence of events in the environment. Probabilistic reward machines (PRMs) are finite-state formalisms that can capture temporal dependencies in the reward signal, along with nondeterministic task outcomes. While special RL algorithms can exploit this finite-state structure to expedite learning, PRMs remain difficult to modify and design by hand. This hinders the already difficult tasks of utilizing high-level causal knowledge about the environment, and transferring the reward formalism into a new domain with a different causal structure. This paper proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments. Furthermore, we provide a theoretical result about convergence to optimal policy for our method, and demonstrate its strengths empirically.

Cite this Paper


BibTeX
@InProceedings{pmlr-v236-corazza24a, title = {Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment}, author = {Corazza, Jan and Aria, Hadi Partovi and Neider, Daniel and Xu, Zhe}, booktitle = {Proceedings of the Third Conference on Causal Learning and Reasoning}, pages = {643--664}, year = {2024}, editor = {Locatello, Francesco and Didelez, Vanessa}, volume = {236}, series = {Proceedings of Machine Learning Research}, month = {01--03 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v236/corazza24a/corazza24a.pdf}, url = {https://proceedings.mlr.press/v236/corazza24a.html}, abstract = {Reinforcement learning (RL) algorithms struggle with learning optimal policies for tasks where reward feedback is sparse and depends on a complex sequence of events in the environment. Probabilistic reward machines (PRMs) are finite-state formalisms that can capture temporal dependencies in the reward signal, along with nondeterministic task outcomes. While special RL algorithms can exploit this finite-state structure to expedite learning, PRMs remain difficult to modify and design by hand. This hinders the already difficult tasks of utilizing high-level causal knowledge about the environment, and transferring the reward formalism into a new domain with a different causal structure. This paper proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments. Furthermore, we provide a theoretical result about convergence to optimal policy for our method, and demonstrate its strengths empirically.} }
Endnote
%0 Conference Paper %T Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment %A Jan Corazza %A Hadi Partovi Aria %A Daniel Neider %A Zhe Xu %B Proceedings of the Third Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2024 %E Francesco Locatello %E Vanessa Didelez %F pmlr-v236-corazza24a %I PMLR %P 643--664 %U https://proceedings.mlr.press/v236/corazza24a.html %V 236 %X Reinforcement learning (RL) algorithms struggle with learning optimal policies for tasks where reward feedback is sparse and depends on a complex sequence of events in the environment. Probabilistic reward machines (PRMs) are finite-state formalisms that can capture temporal dependencies in the reward signal, along with nondeterministic task outcomes. While special RL algorithms can exploit this finite-state structure to expedite learning, PRMs remain difficult to modify and design by hand. This hinders the already difficult tasks of utilizing high-level causal knowledge about the environment, and transferring the reward formalism into a new domain with a different causal structure. This paper proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments. Furthermore, we provide a theoretical result about convergence to optimal policy for our method, and demonstrate its strengths empirically.
APA
Corazza, J., Aria, H.P., Neider, D. & Xu, Z.. (2024). Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment. Proceedings of the Third Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 236:643-664 Available from https://proceedings.mlr.press/v236/corazza24a.html.

Related Material