Online Pre-Training for Offline-to-Online Reinforcement Learning

Yongjae Shin, Jeonghye Kim, Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Youngsoo Jang, Geon-Hyeong Kim, Jongseong Chae, Youngchul Sung, Kanghoon Lee, Woohyung Lim
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:55122-55144, 2025.

Abstract

Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-shin25c, title = {Online Pre-Training for Offline-to-Online Reinforcement Learning}, author = {Shin, Yongjae and Kim, Jeonghye and Jung, Whiyoung and Hong, Sunghoon and Yoon, Deunsol and Jang, Youngsoo and Kim, Geon-Hyeong and Chae, Jongseong and Sung, Youngchul and Lee, Kanghoon and Lim, Woohyung}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {55122--55144}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/shin25c/shin25c.pdf}, url = {https://proceedings.mlr.press/v267/shin25c.html}, abstract = {Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.} }
Endnote
%0 Conference Paper %T Online Pre-Training for Offline-to-Online Reinforcement Learning %A Yongjae Shin %A Jeonghye Kim %A Whiyoung Jung %A Sunghoon Hong %A Deunsol Yoon %A Youngsoo Jang %A Geon-Hyeong Kim %A Jongseong Chae %A Youngchul Sung %A Kanghoon Lee %A Woohyung Lim %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-shin25c %I PMLR %P 55122--55144 %U https://proceedings.mlr.press/v267/shin25c.html %V 267 %X Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.
APA
Shin, Y., Kim, J., Jung, W., Hong, S., Yoon, D., Jang, Y., Kim, G., Chae, J., Sung, Y., Lee, K. & Lim, W.. (2025). Online Pre-Training for Offline-to-Online Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:55122-55144 Available from https://proceedings.mlr.press/v267/shin25c.html.

Related Material