Stochastic Policy Optimization with Heuristic Information for Robot Learning

Seonghyun Kim; Ingook Jang; Samyeul Noh; Hyunseok Kim

Stochastic Policy Optimization with Heuristic Information for Robot Learning

Seonghyun Kim, Ingook Jang, Samyeul Noh, Hyunseok Kim

Proceedings of the 5th Conference on Robot Learning, PMLR 164:1465-1474, 2022.

Abstract

Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.

Cite this Paper

BibTeX


@InProceedings{pmlr-v164-kim22a,
  title = 	 {Stochastic Policy Optimization with Heuristic Information for Robot Learning},
  author =       {Kim, Seonghyun and Jang, Ingook and Noh, Samyeul and Kim, Hyunseok},
  booktitle = 	 {Proceedings of the 5th Conference on Robot Learning},
  pages = 	 {1465--1474},
  year = 	 {2022},
  editor = 	 {Faust, Aleksandra and Hsu, David and Neumann, Gerhard},
  volume = 	 {164},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--11 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v164/kim22a/kim22a.pdf},
  url = 	 {https://proceedings.mlr.press/v164/kim22a.html},
  abstract = 	 {Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.}
}

Endnote

%0 Conference Paper
%T Stochastic Policy Optimization with Heuristic Information for Robot Learning
%A Seonghyun Kim
%A Ingook Jang
%A Samyeul Noh
%A Hyunseok Kim
%B Proceedings of the 5th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Aleksandra Faust
%E David Hsu
%E Gerhard Neumann	
%F pmlr-v164-kim22a
%I PMLR
%P 1465--1474
%U https://proceedings.mlr.press/v164/kim22a.html
%V 164
%X Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.

APA


Kim, S., Jang, I., Noh, S. & Kim, H.. (2022). Stochastic Policy Optimization with Heuristic Information for Robot Learning. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1465-1474 Available from https://proceedings.mlr.press/v164/kim22a.html.

Stochastic Policy Optimization with Heuristic Information for Robot Learning

Abstract

Cite this Paper

Related Material