Efficient Online Reinforcement Learning with Offline Data

Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:1577-1594, 2023.


Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a 2.5× improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.

Cite this Paper

@InProceedings{pmlr-v202-ball23a, title = {Efficient Online Reinforcement Learning with Offline Data}, author = {Ball, Philip J. and Smith, Laura and Kostrikov, Ilya and Levine, Sergey}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {1577--1594}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/ball23a/ball23a.pdf}, url = {https://proceedings.mlr.press/v202/ball23a.html}, abstract = {Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a $\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.} }
%0 Conference Paper %T Efficient Online Reinforcement Learning with Offline Data %A Philip J. Ball %A Laura Smith %A Ilya Kostrikov %A Sergey Levine %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-ball23a %I PMLR %P 1577--1594 %U https://proceedings.mlr.press/v202/ball23a.html %V 202 %X Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a $\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.
Ball, P.J., Smith, L., Kostrikov, I. & Levine, S.. (2023). Efficient Online Reinforcement Learning with Offline Data. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:1577-1594 Available from https://proceedings.mlr.press/v202/ball23a.html.

Related Material