An investigation of time reversal symmetry in reinforcement learning

Brett Barkley; Amy Zhang; David Fridovich-Keil

An investigation of time reversal symmetry in reinforcement learning

Brett Barkley, Amy Zhang, David Fridovich-Keil

Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:68-79, 2024.

Abstract

One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specifically, we investigate the utility of this concept in reducing the sample complexity of reinforcement learning. We observe that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL. Empirical evaluations showcase how these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. We also test this method in more realistic environments where these assumptions are not globally satisfied. We find that TSDA can significantly degrade sample efficiency and policy performance, but can also improve sample efficiency under the right conditions. Ultimately we conclude that time symmetry shows promise in enhancing the sample efficiency of reinforcement learning and provide guidance when the environment and reward structures are of an appropriate form for TSDA to be employed effectively.

Cite this Paper

BibTeX

@InProceedings{pmlr-v242-barkley24a,
  title = 	 {An investigation of time reversal symmetry in reinforcement learning},
  author =       {Barkley, Brett and Zhang, Amy and Fridovich-Keil, David},
  booktitle = 	 {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  pages = 	 {68--79},
  year = 	 {2024},
  editor = 	 {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis},
  volume = 	 {242},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--17 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v242/barkley24a/barkley24a.pdf},
  url = 	 {https://proceedings.mlr.press/v242/barkley24a.html},
  abstract = 	 {One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specifically, we investigate the utility of this concept in reducing the sample complexity of reinforcement learning. We observe that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL. Empirical evaluations showcase how these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. We also test this method in more realistic environments where these assumptions are not globally satisfied. We find that TSDA can significantly degrade sample efficiency and policy performance, but can also improve sample efficiency under the right conditions. Ultimately we conclude that time symmetry shows promise in enhancing the sample efficiency of reinforcement learning and provide guidance when the environment and reward structures are of an appropriate form for TSDA to be employed effectively.}
}

Endnote

%0 Conference Paper
%T An investigation of time reversal symmetry in reinforcement learning
%A Brett Barkley
%A Amy Zhang
%A David Fridovich-Keil
%B Proceedings of the 6th Annual Learning for Dynamics & Control Conference
%C Proceedings of Machine Learning Research
%D 2024
%E Alessandro Abate
%E Mark Cannon
%E Kostas Margellos
%E Antonis Papachristodoulou	
%F pmlr-v242-barkley24a
%I PMLR
%P 68--79
%U https://proceedings.mlr.press/v242/barkley24a.html
%V 242
%X One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specifically, we investigate the utility of this concept in reducing the sample complexity of reinforcement learning. We observe that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL. Empirical evaluations showcase how these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. We also test this method in more realistic environments where these assumptions are not globally satisfied. We find that TSDA can significantly degrade sample efficiency and policy performance, but can also improve sample efficiency under the right conditions. Ultimately we conclude that time symmetry shows promise in enhancing the sample efficiency of reinforcement learning and provide guidance when the environment and reward structures are of an appropriate form for TSDA to be employed effectively.

APA

Barkley, B., Zhang, A. & Fridovich-Keil, D.. (2024). An investigation of time reversal symmetry in reinforcement learning. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:68-79 Available from https://proceedings.mlr.press/v242/barkley24a.html.

Related Material

Download PDF