Dynamic Experience Replay

Jieliang Luo, Hui Li
; Proceedings of the Conference on Robot Learning, PMLR 100:1191-1200, 2020.

Abstract

We present a novel technique called Dynamic Experience Replay (DER) that allows Reinforcement Learning (RL) algorithms to use experience replay samples not only from human demonstrations but also successful transitions generated by RL agents during training and therefore improve training efficiency. It can be combined with an arbitrary off-policy RL algorithm, such as DDPG or DQN, and their distributed versions.We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks, based on force/torque and Cartesian pose observations. In particular, we run experiments on two different tasks: peg-in-hole and lap-joint. In each case, we compare different replay buffer structures and how DER affects them. Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments or solves the tasks that the vanilla Ape-X DDPG cannot solve. We also show that our policies learned purely in simulation can be deployed successfully on the real robot. The video presenting our experiments is available at https://sites.google.com/site/dynamicexperiencereplay

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-luo20a, title = {Dynamic Experience Replay}, author = {Luo, Jieliang and Li, Hui}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {1191--1200}, year = {2020}, editor = {Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura}, volume = {100}, series = {Proceedings of Machine Learning Research}, address = {}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/luo20a/luo20a.pdf}, url = {http://proceedings.mlr.press/v100/luo20a.html}, abstract = {We present a novel technique called Dynamic Experience Replay (DER) that allows Reinforcement Learning (RL) algorithms to use experience replay samples not only from human demonstrations but also successful transitions generated by RL agents during training and therefore improve training efficiency. It can be combined with an arbitrary off-policy RL algorithm, such as DDPG or DQN, and their distributed versions.We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks, based on force/torque and Cartesian pose observations. In particular, we run experiments on two different tasks: peg-in-hole and lap-joint. In each case, we compare different replay buffer structures and how DER affects them. Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments or solves the tasks that the vanilla Ape-X DDPG cannot solve. We also show that our policies learned purely in simulation can be deployed successfully on the real robot. The video presenting our experiments is available at https://sites.google.com/site/dynamicexperiencereplay} }
Endnote
%0 Conference Paper %T Dynamic Experience Replay %A Jieliang Luo %A Hui Li %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-luo20a %I PMLR %J Proceedings of Machine Learning Research %P 1191--1200 %U http://proceedings.mlr.press %V 100 %W PMLR %X We present a novel technique called Dynamic Experience Replay (DER) that allows Reinforcement Learning (RL) algorithms to use experience replay samples not only from human demonstrations but also successful transitions generated by RL agents during training and therefore improve training efficiency. It can be combined with an arbitrary off-policy RL algorithm, such as DDPG or DQN, and their distributed versions.We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks, based on force/torque and Cartesian pose observations. In particular, we run experiments on two different tasks: peg-in-hole and lap-joint. In each case, we compare different replay buffer structures and how DER affects them. Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments or solves the tasks that the vanilla Ape-X DDPG cannot solve. We also show that our policies learned purely in simulation can be deployed successfully on the real robot. The video presenting our experiments is available at https://sites.google.com/site/dynamicexperiencereplay
APA
Luo, J. & Li, H.. (2020). Dynamic Experience Replay. Proceedings of the Conference on Robot Learning, in PMLR 100:1191-1200

Related Material