Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Jin Zhu; Runzhe Wan; Zhengling Qi; Shikai Luo; Chengchun Shi

Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:541-549, 2024.

Abstract

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions. The implementation of the proposal is available at \url{https://github.com/Mamba413/ROOM}.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-zhu24a,
  title = 	 {Robust Offline Reinforcement Learning with Heavy-Tailed Rewards},
  author =       {Zhu, Jin and Wan, Runzhe and Qi, Zhengling and Luo, Shikai and Shi, Chengchun},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {541--549},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/zhu24a/zhu24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/zhu24a.html},
  abstract = 	 {This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions. The implementation of the proposal is available at \url{https://github.com/Mamba413/ROOM}.}
}

Endnote

%0 Conference Paper
%T Robust Offline Reinforcement Learning with Heavy-Tailed Rewards
%A Jin Zhu
%A Runzhe Wan
%A Zhengling Qi
%A Shikai Luo
%A Chengchun Shi
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-zhu24a
%I PMLR
%P 541--549
%U https://proceedings.mlr.press/v238/zhu24a.html
%V 238
%X This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions. The implementation of the proposal is available at \url{https://github.com/Mamba413/ROOM}.

APA

Zhu, J., Wan, R., Qi, Z., Luo, S. & Shi, C.. (2024). Robust Offline Reinforcement Learning with Heavy-Tailed Rewards. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:541-549 Available from https://proceedings.mlr.press/v238/zhu24a.html.

Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Abstract

Cite this Paper

Related Material