Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11319-11328, 2021.

Abstract

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-wu21i, title = {Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning}, author = {Wu, Yue and Zhai, Shuangfei and Srivastava, Nitish and Susskind, Joshua M and Zhang, Jian and Salakhutdinov, Ruslan and Goh, Hanlin}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {11319--11328}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/wu21i/wu21i.pdf}, url = {https://proceedings.mlr.press/v139/wu21i.html}, abstract = {Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.} }
Endnote
%0 Conference Paper %T Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning %A Yue Wu %A Shuangfei Zhai %A Nitish Srivastava %A Joshua M Susskind %A Jian Zhang %A Ruslan Salakhutdinov %A Hanlin Goh %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-wu21i %I PMLR %P 11319--11328 %U https://proceedings.mlr.press/v139/wu21i.html %V 139 %X Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.
APA
Wu, Y., Zhai, S., Srivastava, N., Susskind, J.M., Zhang, J., Salakhutdinov, R. & Goh, H.. (2021). Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11319-11328 Available from https://proceedings.mlr.press/v139/wu21i.html.

Related Material