Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen
Proceedings of The 6th Conference on Robot Learning, PMLR 205:1146-1156, 2023.

Abstract

Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods may encounter a phenomenon called reward signal backward propagation blockages so that the agent cannot be effectively guided by the demonstrations from mismatched task. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations. Experimental results of robot manipulation tasks show that our approach outperforms baseline LfD methods when transferring demonstrations collected in a single task to other different but similar tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-guo23a, title = {Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward}, author = {Guo, Yanjiang and Gao, Jingyue and Wu, Zheng and Shi, Chengming and Chen, Jianyu}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {1146--1156}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/guo23a/guo23a.pdf}, url = {https://proceedings.mlr.press/v205/guo23a.html}, abstract = {Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods may encounter a phenomenon called reward signal backward propagation blockages so that the agent cannot be effectively guided by the demonstrations from mismatched task. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations. Experimental results of robot manipulation tasks show that our approach outperforms baseline LfD methods when transferring demonstrations collected in a single task to other different but similar tasks.} }
Endnote
%0 Conference Paper %T Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward %A Yanjiang Guo %A Jingyue Gao %A Zheng Wu %A Chengming Shi %A Jianyu Chen %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-guo23a %I PMLR %P 1146--1156 %U https://proceedings.mlr.press/v205/guo23a.html %V 205 %X Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods may encounter a phenomenon called reward signal backward propagation blockages so that the agent cannot be effectively guided by the demonstrations from mismatched task. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations. Experimental results of robot manipulation tasks show that our approach outperforms baseline LfD methods when transferring demonstrations collected in a single task to other different but similar tasks.
APA
Guo, Y., Gao, J., Wu, Z., Shi, C. & Chen, J.. (2023). Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1146-1156 Available from https://proceedings.mlr.press/v205/guo23a.html.

Related Material