Hybrid Reinforcement Learning from Offline Observation Alone

Yuda Song, Drew Bagnell, Aarti Singh
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:46019-46049, 2024.

Abstract

We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. While RL research typically assumes offline data contains complete action, reward and transition information, datasets with only state information (also known as observation-only datasets) are more general, abundant and practical. This motivates our study of the hybrid RL with observation-only offline dataset framework. While the task of competing with the best policy “covered” by the offline data can be solved if a reset model of the environment is provided (i.e., one that can be reset to any state), we show evidence of hardness of competing when only given the weaker trace model (i.e., one can only reset to the initial states and must produce full traces through the environment), without further assumption of admissibility of the offline data. Under the admissibility assumptions– that the offline data could actually be produced by the policy class we consider– we propose the first algorithm in the trace model setting that provably matches the performance of algorithms that leverage a reset model. We also perform proof-of-concept experiments that suggest the effectiveness of our algorithm in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-song24a, title = {Hybrid Reinforcement Learning from Offline Observation Alone}, author = {Song, Yuda and Bagnell, Drew and Singh, Aarti}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {46019--46049}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/song24a/song24a.pdf}, url = {https://proceedings.mlr.press/v235/song24a.html}, abstract = {We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. While RL research typically assumes offline data contains complete action, reward and transition information, datasets with only state information (also known as observation-only datasets) are more general, abundant and practical. This motivates our study of the hybrid RL with observation-only offline dataset framework. While the task of competing with the best policy “covered” by the offline data can be solved if a reset model of the environment is provided (i.e., one that can be reset to any state), we show evidence of hardness of competing when only given the weaker trace model (i.e., one can only reset to the initial states and must produce full traces through the environment), without further assumption of admissibility of the offline data. Under the admissibility assumptions– that the offline data could actually be produced by the policy class we consider– we propose the first algorithm in the trace model setting that provably matches the performance of algorithms that leverage a reset model. We also perform proof-of-concept experiments that suggest the effectiveness of our algorithm in practice.} }
Endnote
%0 Conference Paper %T Hybrid Reinforcement Learning from Offline Observation Alone %A Yuda Song %A Drew Bagnell %A Aarti Singh %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-song24a %I PMLR %P 46019--46049 %U https://proceedings.mlr.press/v235/song24a.html %V 235 %X We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. While RL research typically assumes offline data contains complete action, reward and transition information, datasets with only state information (also known as observation-only datasets) are more general, abundant and practical. This motivates our study of the hybrid RL with observation-only offline dataset framework. While the task of competing with the best policy “covered” by the offline data can be solved if a reset model of the environment is provided (i.e., one that can be reset to any state), we show evidence of hardness of competing when only given the weaker trace model (i.e., one can only reset to the initial states and must produce full traces through the environment), without further assumption of admissibility of the offline data. Under the admissibility assumptions– that the offline data could actually be produced by the policy class we consider– we propose the first algorithm in the trace model setting that provably matches the performance of algorithms that leverage a reset model. We also perform proof-of-concept experiments that suggest the effectiveness of our algorithm in practice.
APA
Song, Y., Bagnell, D. & Singh, A.. (2024). Hybrid Reinforcement Learning from Offline Observation Alone. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:46019-46049 Available from https://proceedings.mlr.press/v235/song24a.html.

Related Material