Energy-Based Hindsight Experience Prioritization

Rui Zhao, Volker Tresp
Proceedings of The 2nd Conference on Robot Learning, PMLR 87:113-122, 2018.

Abstract

In Hindsight Experience Replay (HER), a reinforcement learning agent is trained by treating whatever it has achieved as virtual goals. However, in pre- vious work, the experience was replayed at random, without considering which episode might be the most valuable for learning. In this paper, we develop an energy-based framework for prioritizing hindsight experience in robotic manipulation tasks. Our approach is inspired by the work-energy principle in physics. We define a trajectory energy function as the sum of the transition energy of the target object over the trajectory. We hypothesize that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics. To verify our hypothesis, we designed a framework for hindsight experience prioritization based on the trajectory energy of goal states. The trajectory energy function takes the potential, kinetic, and rotational energy into consideration. We evaluate our Energy-Based Prioritization (EBP) approach on four challenging robotic manipulation tasks in simulation. Our empirical results show that our proposed method surpasses state-of-the-art approaches in terms of both performance and sample-efficiency on all four tasks, without increas- ing computational time. A video showing experimental results is available at https://youtu.be/jtsF2tTeUGQ.

Cite this Paper


BibTeX
@InProceedings{pmlr-v87-zhao18a, title = {Energy-Based Hindsight Experience Prioritization}, author = {Zhao, Rui and Tresp, Volker}, booktitle = {Proceedings of The 2nd Conference on Robot Learning}, pages = {113--122}, year = {2018}, editor = {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun}, volume = {87}, series = {Proceedings of Machine Learning Research}, month = {29--31 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v87/zhao18a/zhao18a.pdf}, url = {https://proceedings.mlr.press/v87/zhao18a.html}, abstract = {In Hindsight Experience Replay (HER), a reinforcement learning agent is trained by treating whatever it has achieved as virtual goals. However, in pre- vious work, the experience was replayed at random, without considering which episode might be the most valuable for learning. In this paper, we develop an energy-based framework for prioritizing hindsight experience in robotic manipulation tasks. Our approach is inspired by the work-energy principle in physics. We define a trajectory energy function as the sum of the transition energy of the target object over the trajectory. We hypothesize that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics. To verify our hypothesis, we designed a framework for hindsight experience prioritization based on the trajectory energy of goal states. The trajectory energy function takes the potential, kinetic, and rotational energy into consideration. We evaluate our Energy-Based Prioritization (EBP) approach on four challenging robotic manipulation tasks in simulation. Our empirical results show that our proposed method surpasses state-of-the-art approaches in terms of both performance and sample-efficiency on all four tasks, without increas- ing computational time. A video showing experimental results is available at https://youtu.be/jtsF2tTeUGQ. } }
Endnote
%0 Conference Paper %T Energy-Based Hindsight Experience Prioritization %A Rui Zhao %A Volker Tresp %B Proceedings of The 2nd Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2018 %E Aude Billard %E Anca Dragan %E Jan Peters %E Jun Morimoto %F pmlr-v87-zhao18a %I PMLR %P 113--122 %U https://proceedings.mlr.press/v87/zhao18a.html %V 87 %X In Hindsight Experience Replay (HER), a reinforcement learning agent is trained by treating whatever it has achieved as virtual goals. However, in pre- vious work, the experience was replayed at random, without considering which episode might be the most valuable for learning. In this paper, we develop an energy-based framework for prioritizing hindsight experience in robotic manipulation tasks. Our approach is inspired by the work-energy principle in physics. We define a trajectory energy function as the sum of the transition energy of the target object over the trajectory. We hypothesize that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics. To verify our hypothesis, we designed a framework for hindsight experience prioritization based on the trajectory energy of goal states. The trajectory energy function takes the potential, kinetic, and rotational energy into consideration. We evaluate our Energy-Based Prioritization (EBP) approach on four challenging robotic manipulation tasks in simulation. Our empirical results show that our proposed method surpasses state-of-the-art approaches in terms of both performance and sample-efficiency on all four tasks, without increas- ing computational time. A video showing experimental results is available at https://youtu.be/jtsF2tTeUGQ.
APA
Zhao, R. & Tresp, V.. (2018). Energy-Based Hindsight Experience Prioritization. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:113-122 Available from https://proceedings.mlr.press/v87/zhao18a.html.

Related Material