Efficient Online Reinforcement Learning with Offline Data

Philip J. Ball; Laura Smith; Ilya Kostrikov; Sergey Levine

Efficient Online Reinforcement Learning with Offline Data

Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:1577-1594, 2023.

Abstract

Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a

$\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-ball23a,
  title = 	 {Efficient Online Reinforcement Learning with Offline Data},
  author =       {Ball, Philip J. and Smith, Laura and Kostrikov, Ilya and Levine, Sergey},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {1577--1594},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/ball23a/ball23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/ball23a.html},
  abstract = 	 {Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a $\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.}
}

Endnote

%0 Conference Paper
%T Efficient Online Reinforcement Learning with Offline Data
%A Philip J. Ball
%A Laura Smith
%A Ilya Kostrikov
%A Sergey Levine
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-ball23a
%I PMLR
%P 1577--1594
%U https://proceedings.mlr.press/v202/ball23a.html
%V 202
%X Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a $\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.

APA


Ball, P.J., Smith, L., Kostrikov, I. & Levine, S.. (2023). Efficient Online Reinforcement Learning with Offline Data. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:1577-1594 Available from https://proceedings.mlr.press/v202/ball23a.html.

Efficient Online Reinforcement Learning with Offline Data

Abstract

Cite this Paper

Related Material