Semi-Offline Reinforcement Learning for Optimized Text Generation

Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:5087-5103, 2023.

Abstract

Existing reinforcement learning (RL) mainly utilize online or offline settings. The online methods explore the environment with expensive time cost, and the offline methods efficiently obtain reward signals by sacrificing the exploration capability. We propose semi-offline RL, a novel paradigm that can smoothly transit from the offline setting to the online setting, balances the exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline MDP formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline RL approach is effective in various text generation tasks and datasets, and yields comparable or usually better performance compared with the state-of-the-art methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chen23ad, title = {Semi-Offline Reinforcement Learning for Optimized Text Generation}, author = {Chen, Changyu and Wang, Xiting and Jin, Yiqiao and Dong, Victor Ye and Dong, Li and Cao, Jie and Liu, Yi and Yan, Rui}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {5087--5103}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chen23ad/chen23ad.pdf}, url = {https://proceedings.mlr.press/v202/chen23ad.html}, abstract = {Existing reinforcement learning (RL) mainly utilize online or offline settings. The online methods explore the environment with expensive time cost, and the offline methods efficiently obtain reward signals by sacrificing the exploration capability. We propose semi-offline RL, a novel paradigm that can smoothly transit from the offline setting to the online setting, balances the exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline MDP formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline RL approach is effective in various text generation tasks and datasets, and yields comparable or usually better performance compared with the state-of-the-art methods.} }
Endnote
%0 Conference Paper %T Semi-Offline Reinforcement Learning for Optimized Text Generation %A Changyu Chen %A Xiting Wang %A Yiqiao Jin %A Victor Ye Dong %A Li Dong %A Jie Cao %A Yi Liu %A Rui Yan %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chen23ad %I PMLR %P 5087--5103 %U https://proceedings.mlr.press/v202/chen23ad.html %V 202 %X Existing reinforcement learning (RL) mainly utilize online or offline settings. The online methods explore the environment with expensive time cost, and the offline methods efficiently obtain reward signals by sacrificing the exploration capability. We propose semi-offline RL, a novel paradigm that can smoothly transit from the offline setting to the online setting, balances the exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline MDP formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline RL approach is effective in various text generation tasks and datasets, and yields comparable or usually better performance compared with the state-of-the-art methods.
APA
Chen, C., Wang, X., Jin, Y., Dong, V.Y., Dong, L., Cao, J., Liu, Y. & Yan, R.. (2023). Semi-Offline Reinforcement Learning for Optimized Text Generation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:5087-5103 Available from https://proceedings.mlr.press/v202/chen23ad.html.

Related Material