Relaxed Transition Kernels can Cure Underestimation in Adversarial Offline Reinforcement Learning

Ziyu Wang, Ping-Chun Hsieh, Yu-Shuen Wang, Yun-Hsuan Lien
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:145-160, 2025.

Abstract

Offline reinforcement learning (RL) trains policies from pre-collected data without further environment interaction. However, discrepancies between the dataset and true environment—particularly in the state transition kernel—can degrade policy performance. To simulate environment shifts without being overly conservative, we introduce a relaxed state-adversarial method that perturbs the policy while applying a controlled relaxation mechanism. This method improves robustness by interpolating between nominal and adversarial dynamics. Theoretically, we provide a performance lower bound; empirically, we show improved results across challenging offline RL benchmarks. Our approach integrates easily with existing model-free algorithms and consistently outperforms baselines, especially in high-difficulty domains like Adroit and AntMaze.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-wang25a, title = {Relaxed Transition Kernels can Cure Underestimation in Adversarial Offline Reinforcement Learning}, author = {Wang, Ziyu and Hsieh, Ping-Chun and Wang, Yu-Shuen and Lien, Yun-Hsuan}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {145--160}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/wang25a/wang25a.pdf}, url = {https://proceedings.mlr.press/v304/wang25a.html}, abstract = {Offline reinforcement learning (RL) trains policies from pre-collected data without further environment interaction. However, discrepancies between the dataset and true environment—particularly in the state transition kernel—can degrade policy performance. To simulate environment shifts without being overly conservative, we introduce a relaxed state-adversarial method that perturbs the policy while applying a controlled relaxation mechanism. This method improves robustness by interpolating between nominal and adversarial dynamics. Theoretically, we provide a performance lower bound; empirically, we show improved results across challenging offline RL benchmarks. Our approach integrates easily with existing model-free algorithms and consistently outperforms baselines, especially in high-difficulty domains like Adroit and AntMaze.} }
Endnote
%0 Conference Paper %T Relaxed Transition Kernels can Cure Underestimation in Adversarial Offline Reinforcement Learning %A Ziyu Wang %A Ping-Chun Hsieh %A Yu-Shuen Wang %A Yun-Hsuan Lien %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-wang25a %I PMLR %P 145--160 %U https://proceedings.mlr.press/v304/wang25a.html %V 304 %X Offline reinforcement learning (RL) trains policies from pre-collected data without further environment interaction. However, discrepancies between the dataset and true environment—particularly in the state transition kernel—can degrade policy performance. To simulate environment shifts without being overly conservative, we introduce a relaxed state-adversarial method that perturbs the policy while applying a controlled relaxation mechanism. This method improves robustness by interpolating between nominal and adversarial dynamics. Theoretically, we provide a performance lower bound; empirically, we show improved results across challenging offline RL benchmarks. Our approach integrates easily with existing model-free algorithms and consistently outperforms baselines, especially in high-difficulty domains like Adroit and AntMaze.
APA
Wang, Z., Hsieh, P., Wang, Y. & Lien, Y.. (2025). Relaxed Transition Kernels can Cure Underestimation in Adversarial Offline Reinforcement Learning. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:145-160 Available from https://proceedings.mlr.press/v304/wang25a.html.

Related Material