Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:541-549, 2024.

Abstract

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions. The implementation of the proposal is available at \url{https://github.com/Mamba413/ROOM}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-zhu24a, title = {Robust Offline Reinforcement Learning with Heavy-Tailed Rewards}, author = {Zhu, Jin and Wan, Runzhe and Qi, Zhengling and Luo, Shikai and Shi, Chengchun}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {541--549}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/zhu24a/zhu24a.pdf}, url = {https://proceedings.mlr.press/v238/zhu24a.html}, abstract = {This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions. The implementation of the proposal is available at \url{https://github.com/Mamba413/ROOM}.} }
Endnote
%0 Conference Paper %T Robust Offline Reinforcement Learning with Heavy-Tailed Rewards %A Jin Zhu %A Runzhe Wan %A Zhengling Qi %A Shikai Luo %A Chengchun Shi %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-zhu24a %I PMLR %P 541--549 %U https://proceedings.mlr.press/v238/zhu24a.html %V 238 %X This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions. The implementation of the proposal is available at \url{https://github.com/Mamba413/ROOM}.
APA
Zhu, J., Wan, R., Qi, Z., Luo, S. & Shi, C.. (2024). Robust Offline Reinforcement Learning with Heavy-Tailed Rewards. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:541-549 Available from https://proceedings.mlr.press/v238/zhu24a.html.

Related Material