On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness

Haotian Ye, Xiaoyu Chen, Liwei Wang, Simon Shaolei Du
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:39770-39800, 2023.

Abstract

Generalization in Reinforcement Learning (RL) aims to train an agent during training that generalizes to the target environment. In this work, we first point out that RL generalization is fundamentally different from the generalization in supervised learning, and fine-tuning on the target environment is necessary for good test performance. Therefore, we seek to answer the following question: how much can we expect pre-training over training environments to be helpful for efficient and effective fine-tuning? On one hand, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, we show that pre-training can be indeed helpful in the non-asymptotic regime by designing a policy collection-elimination (PCE) algorithm and proving a distribution-dependent regret bound that is independent of the state-action space. We hope our theoretical results can provide insight towards understanding pre-training and generalization in RL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-ye23a, title = {On the Power of Pre-training for Generalization in {RL}: Provable Benefits and Hardness}, author = {Ye, Haotian and Chen, Xiaoyu and Wang, Liwei and Du, Simon Shaolei}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {39770--39800}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/ye23a/ye23a.pdf}, url = {https://proceedings.mlr.press/v202/ye23a.html}, abstract = {Generalization in Reinforcement Learning (RL) aims to train an agent during training that generalizes to the target environment. In this work, we first point out that RL generalization is fundamentally different from the generalization in supervised learning, and fine-tuning on the target environment is necessary for good test performance. Therefore, we seek to answer the following question: how much can we expect pre-training over training environments to be helpful for efficient and effective fine-tuning? On one hand, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, we show that pre-training can be indeed helpful in the non-asymptotic regime by designing a policy collection-elimination (PCE) algorithm and proving a distribution-dependent regret bound that is independent of the state-action space. We hope our theoretical results can provide insight towards understanding pre-training and generalization in RL.} }
Endnote
%0 Conference Paper %T On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness %A Haotian Ye %A Xiaoyu Chen %A Liwei Wang %A Simon Shaolei Du %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-ye23a %I PMLR %P 39770--39800 %U https://proceedings.mlr.press/v202/ye23a.html %V 202 %X Generalization in Reinforcement Learning (RL) aims to train an agent during training that generalizes to the target environment. In this work, we first point out that RL generalization is fundamentally different from the generalization in supervised learning, and fine-tuning on the target environment is necessary for good test performance. Therefore, we seek to answer the following question: how much can we expect pre-training over training environments to be helpful for efficient and effective fine-tuning? On one hand, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, we show that pre-training can be indeed helpful in the non-asymptotic regime by designing a policy collection-elimination (PCE) algorithm and proving a distribution-dependent regret bound that is independent of the state-action space. We hope our theoretical results can provide insight towards understanding pre-training and generalization in RL.
APA
Ye, H., Chen, X., Wang, L. & Du, S.S.. (2023). On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:39770-39800 Available from https://proceedings.mlr.press/v202/ye23a.html.

Related Material