Enhancing Value Function Estimation through First-Order State-Action Dynamics in Offline Reinforcement Learning

Yun-Hsuan Lien, Ping-Chun Hsieh, Tzu-Mao Li, Yu-Shuen Wang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:29782-29794, 2024.

Abstract

In offline reinforcement learning (RL), updating the value function with the discrete-time Bellman Equation often encounters challenges due to the limited scope of available data. This limitation stems from the Bellman Equation, which cannot accurately predict the value of unvisited states. To address this issue, we have introduced an innovative solution that bridges the continuous- and discrete-time RL methods, capitalizing on their advantages. Our method uses a discrete-time RL algorithm to derive the value function from a dataset while ensuring that the function’s first derivative aligns with the local characteristics of states and actions, as defined by the Hamilton-Jacobi-Bellman equation in continuous RL. We provide practical algorithms for both deterministic policy gradient methods and stochastic policy gradient methods. Experiments on the D4RL dataset show that incorporating the first-order information significantly improves policy performance for offline RL problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-lien24a, title = {Enhancing Value Function Estimation through First-Order State-Action Dynamics in Offline Reinforcement Learning}, author = {Lien, Yun-Hsuan and Hsieh, Ping-Chun and Li, Tzu-Mao and Wang, Yu-Shuen}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {29782--29794}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/lien24a/lien24a.pdf}, url = {https://proceedings.mlr.press/v235/lien24a.html}, abstract = {In offline reinforcement learning (RL), updating the value function with the discrete-time Bellman Equation often encounters challenges due to the limited scope of available data. This limitation stems from the Bellman Equation, which cannot accurately predict the value of unvisited states. To address this issue, we have introduced an innovative solution that bridges the continuous- and discrete-time RL methods, capitalizing on their advantages. Our method uses a discrete-time RL algorithm to derive the value function from a dataset while ensuring that the function’s first derivative aligns with the local characteristics of states and actions, as defined by the Hamilton-Jacobi-Bellman equation in continuous RL. We provide practical algorithms for both deterministic policy gradient methods and stochastic policy gradient methods. Experiments on the D4RL dataset show that incorporating the first-order information significantly improves policy performance for offline RL problems.} }
Endnote
%0 Conference Paper %T Enhancing Value Function Estimation through First-Order State-Action Dynamics in Offline Reinforcement Learning %A Yun-Hsuan Lien %A Ping-Chun Hsieh %A Tzu-Mao Li %A Yu-Shuen Wang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-lien24a %I PMLR %P 29782--29794 %U https://proceedings.mlr.press/v235/lien24a.html %V 235 %X In offline reinforcement learning (RL), updating the value function with the discrete-time Bellman Equation often encounters challenges due to the limited scope of available data. This limitation stems from the Bellman Equation, which cannot accurately predict the value of unvisited states. To address this issue, we have introduced an innovative solution that bridges the continuous- and discrete-time RL methods, capitalizing on their advantages. Our method uses a discrete-time RL algorithm to derive the value function from a dataset while ensuring that the function’s first derivative aligns with the local characteristics of states and actions, as defined by the Hamilton-Jacobi-Bellman equation in continuous RL. We provide practical algorithms for both deterministic policy gradient methods and stochastic policy gradient methods. Experiments on the D4RL dataset show that incorporating the first-order information significantly improves policy performance for offline RL problems.
APA
Lien, Y., Hsieh, P., Li, T. & Wang, Y.. (2024). Enhancing Value Function Estimation through First-Order State-Action Dynamics in Offline Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:29782-29794 Available from https://proceedings.mlr.press/v235/lien24a.html.

Related Material