Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

Rui Zhao; Xudong Sun; Volker Tresp

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

Rui Zhao, Xudong Sun, Volker Tresp

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7553-7562, 2019.

Abstract

In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-zhao19d,
  title = 	 {Maximum Entropy-Regularized Multi-Goal Reinforcement Learning},
  author =       {Zhao, Rui and Sun, Xudong and Tresp, Volker},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {7553--7562},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/zhao19d/zhao19d.pdf},
  url = 	 {https://proceedings.mlr.press/v97/zhao19d.html},
  abstract = 	 {In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.}
}

Endnote

%0 Conference Paper
%T Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
%A Rui Zhao
%A Xudong Sun
%A Volker Tresp
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-zhao19d
%I PMLR
%P 7553--7562
%U https://proceedings.mlr.press/v97/zhao19d.html
%V 97
%X In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.

APA

Zhao, R., Sun, X. & Tresp, V.. (2019). Maximum Entropy-Regularized Multi-Goal Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7553-7562 Available from https://proceedings.mlr.press/v97/zhao19d.html.

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

Abstract

Cite this Paper

Related Material