Dealing with the Unknown: Pessimistic Offline Reinforcement Learning

Jinning Li, Chen Tang, Masayoshi Tomizuka, Wei Zhan
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1455-1464, 2022.

Abstract

Reinforcement Learning (RL) has been shown effective in domains where the agent can learn policies by actively interacting with its operating environment. However, if we change the RL scheme to offline setting where the agent can only update its policy via static datasets, one of the major issues in offline reinforcement learning emerges, i.e. distributional shift. We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar by manipulating the value function. We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset, so that the learned pessimistic value function lower bounds the true value anywhere within the state space. We evaluate the PessORL algorithm on various benchmark tasks, where we show that our method gains better performance by explicitly handling OOD states, when compared to those methods merely considering OOD actions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-li22d, title = {Dealing with the Unknown: Pessimistic Offline Reinforcement Learning}, author = {Li, Jinning and Tang, Chen and Tomizuka, Masayoshi and Zhan, Wei}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1455--1464}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/li22d/li22d.pdf}, url = {https://proceedings.mlr.press/v164/li22d.html}, abstract = {Reinforcement Learning (RL) has been shown effective in domains where the agent can learn policies by actively interacting with its operating environment. However, if we change the RL scheme to offline setting where the agent can only update its policy via static datasets, one of the major issues in offline reinforcement learning emerges, i.e. distributional shift. We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar by manipulating the value function. We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset, so that the learned pessimistic value function lower bounds the true value anywhere within the state space. We evaluate the PessORL algorithm on various benchmark tasks, where we show that our method gains better performance by explicitly handling OOD states, when compared to those methods merely considering OOD actions.} }
Endnote
%0 Conference Paper %T Dealing with the Unknown: Pessimistic Offline Reinforcement Learning %A Jinning Li %A Chen Tang %A Masayoshi Tomizuka %A Wei Zhan %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-li22d %I PMLR %P 1455--1464 %U https://proceedings.mlr.press/v164/li22d.html %V 164 %X Reinforcement Learning (RL) has been shown effective in domains where the agent can learn policies by actively interacting with its operating environment. However, if we change the RL scheme to offline setting where the agent can only update its policy via static datasets, one of the major issues in offline reinforcement learning emerges, i.e. distributional shift. We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar by manipulating the value function. We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset, so that the learned pessimistic value function lower bounds the true value anywhere within the state space. We evaluate the PessORL algorithm on various benchmark tasks, where we show that our method gains better performance by explicitly handling OOD states, when compared to those methods merely considering OOD actions.
APA
Li, J., Tang, C., Tomizuka, M. & Zhan, W.. (2022). Dealing with the Unknown: Pessimistic Offline Reinforcement Learning. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1455-1464 Available from https://proceedings.mlr.press/v164/li22d.html.

Related Material