Stochastic Policy Optimization with Heuristic Information for Robot Learning

Seonghyun Kim, Ingook Jang, Samyeul Noh, Hyunseok Kim
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1465-1474, 2022.

Abstract

Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-kim22a, title = {Stochastic Policy Optimization with Heuristic Information for Robot Learning}, author = {Kim, Seonghyun and Jang, Ingook and Noh, Samyeul and Kim, Hyunseok}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1465--1474}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/kim22a/kim22a.pdf}, url = {https://proceedings.mlr.press/v164/kim22a.html}, abstract = {Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.} }
Endnote
%0 Conference Paper %T Stochastic Policy Optimization with Heuristic Information for Robot Learning %A Seonghyun Kim %A Ingook Jang %A Samyeul Noh %A Hyunseok Kim %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-kim22a %I PMLR %P 1465--1474 %U https://proceedings.mlr.press/v164/kim22a.html %V 164 %X Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.
APA
Kim, S., Jang, I., Noh, S. & Kim, H.. (2022). Stochastic Policy Optimization with Heuristic Information for Robot Learning. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1465-1474 Available from https://proceedings.mlr.press/v164/kim22a.html.

Related Material