Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu; Shuangfei Zhai; Nitish Srivastava; Joshua M Susskind; Jian Zhang; Ruslan Salakhutdinov; Hanlin Goh

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11319-11328, 2021.

Abstract

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-wu21i,
  title = 	 {Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning},
  author =       {Wu, Yue and Zhai, Shuangfei and Srivastava, Nitish and Susskind, Joshua M and Zhang, Jian and Salakhutdinov, Ruslan and Goh, Hanlin},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11319--11328},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/wu21i/wu21i.pdf},
  url = 	 {https://proceedings.mlr.press/v139/wu21i.html},
  abstract = 	 {Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.}
}

Endnote

%0 Conference Paper
%T Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
%A Yue Wu
%A Shuangfei Zhai
%A Nitish Srivastava
%A Joshua M Susskind
%A Jian Zhang
%A Ruslan Salakhutdinov
%A Hanlin Goh
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-wu21i
%I PMLR
%P 11319--11328
%U https://proceedings.mlr.press/v139/wu21i.html
%V 139
%X Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

APA


Wu, Y., Zhai, S., Srivastava, N., Susskind, J.M., Zhang, J., Salakhutdinov, R. & Goh, H.. (2021). Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11319-11328 Available from https://proceedings.mlr.press/v139/wu21i.html.

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Abstract

Cite this Paper

Related Material