Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data

Chengrui Qu, Laixi Shi, Kishan Panaganti, Pengcheng You, Adam Wierman
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1054-1062, 2025.

Abstract

Online reinforcement learning (RL) typically requires online interaction data to learn a policy for a target task, but collecting such data can be high-stakes. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that – without information on the dynamics shift – general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, focusing on HTRL with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that outperforms pure online RL with problem-dependent sample complexity guarantees. Finally, our experimental results demonstrate that HySRL surpasses the state-of-the-art pure online RL baseline.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-qu25a, title = {Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data}, author = {Qu, Chengrui and Shi, Laixi and Panaganti, Kishan and You, Pengcheng and Wierman, Adam}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1054--1062}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/qu25a/qu25a.pdf}, url = {https://proceedings.mlr.press/v258/qu25a.html}, abstract = {Online reinforcement learning (RL) typically requires online interaction data to learn a policy for a target task, but collecting such data can be high-stakes. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that – without information on the dynamics shift – general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, focusing on HTRL with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that outperforms pure online RL with problem-dependent sample complexity guarantees. Finally, our experimental results demonstrate that HySRL surpasses the state-of-the-art pure online RL baseline.} }
Endnote
%0 Conference Paper %T Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data %A Chengrui Qu %A Laixi Shi %A Kishan Panaganti %A Pengcheng You %A Adam Wierman %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-qu25a %I PMLR %P 1054--1062 %U https://proceedings.mlr.press/v258/qu25a.html %V 258 %X Online reinforcement learning (RL) typically requires online interaction data to learn a policy for a target task, but collecting such data can be high-stakes. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that – without information on the dynamics shift – general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, focusing on HTRL with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that outperforms pure online RL with problem-dependent sample complexity guarantees. Finally, our experimental results demonstrate that HySRL surpasses the state-of-the-art pure online RL baseline.
APA
Qu, C., Shi, L., Panaganti, K., You, P. & Wierman, A.. (2025). Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1054-1062 Available from https://proceedings.mlr.press/v258/qu25a.html.

Related Material