USHER: Unbiased Sampling for Hindsight Experience Replay

Liam Schramm; Yunfu Deng; Edgar Granados; Abdeslam Boularias

USHER: Unbiased Sampling for Hindsight Experience Replay

Liam Schramm, Yunfu Deng, Edgar Granados, Abdeslam Boularias

Proceedings of The 6th Conference on Robot Learning, PMLR 205:2073-2082, 2023.

Abstract

Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

Cite this Paper

BibTeX


@InProceedings{pmlr-v205-schramm23a,
  title = 	 {USHER: Unbiased Sampling for Hindsight Experience Replay},
  author =       {Schramm, Liam and Deng, Yunfu and Granados, Edgar and Boularias, Abdeslam},
  booktitle = 	 {Proceedings of The 6th Conference on Robot Learning},
  pages = 	 {2073--2082},
  year = 	 {2023},
  editor = 	 {Liu, Karen and Kulic, Dana and Ichnowski, Jeff},
  volume = 	 {205},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--18 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v205/schramm23a/schramm23a.pdf},
  url = 	 {https://proceedings.mlr.press/v205/schramm23a.html},
  abstract = 	 { Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.}
}

Endnote

%0 Conference Paper
%T USHER: Unbiased Sampling for Hindsight Experience Replay
%A Liam Schramm
%A Yunfu Deng
%A Edgar Granados
%A Abdeslam Boularias
%B Proceedings of The 6th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Karen Liu
%E Dana Kulic
%E Jeff Ichnowski	
%F pmlr-v205-schramm23a
%I PMLR
%P 2073--2082
%U https://proceedings.mlr.press/v205/schramm23a.html
%V 205
%X  Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

APA


Schramm, L., Deng, Y., Granados, E. & Boularias, A.. (2023). USHER: Unbiased Sampling for Hindsight Experience Replay. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:2073-2082 Available from https://proceedings.mlr.press/v205/schramm23a.html.

USHER: Unbiased Sampling for Hindsight Experience Replay

Abstract

Cite this Paper

Related Material