USHER: Unbiased Sampling for Hindsight Experience Replay

Liam Schramm, Yunfu Deng, Edgar Granados, Abdeslam Boularias
Proceedings of The 6th Conference on Robot Learning, PMLR 205:2073-2082, 2023.

Abstract

Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-schramm23a, title = {USHER: Unbiased Sampling for Hindsight Experience Replay}, author = {Schramm, Liam and Deng, Yunfu and Granados, Edgar and Boularias, Abdeslam}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {2073--2082}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/schramm23a/schramm23a.pdf}, url = {https://proceedings.mlr.press/v205/schramm23a.html}, abstract = { Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.} }
Endnote
%0 Conference Paper %T USHER: Unbiased Sampling for Hindsight Experience Replay %A Liam Schramm %A Yunfu Deng %A Edgar Granados %A Abdeslam Boularias %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-schramm23a %I PMLR %P 2073--2082 %U https://proceedings.mlr.press/v205/schramm23a.html %V 205 %X Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.
APA
Schramm, L., Deng, Y., Granados, E. & Boularias, A.. (2023). USHER: Unbiased Sampling for Hindsight Experience Replay. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:2073-2082 Available from https://proceedings.mlr.press/v205/schramm23a.html.

Related Material