Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning

Mingqi Yuan, Bo Li, Xin Jin, Wenjun Zeng
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:40531-40554, 2023.

Abstract

We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-yuan23c, title = {Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning}, author = {Yuan, Mingqi and Li, Bo and Jin, Xin and Zeng, Wenjun}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {40531--40554}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/yuan23c/yuan23c.pdf}, url = {https://proceedings.mlr.press/v202/yuan23c.html}, abstract = {We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.} }
Endnote
%0 Conference Paper %T Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning %A Mingqi Yuan %A Bo Li %A Xin Jin %A Wenjun Zeng %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-yuan23c %I PMLR %P 40531--40554 %U https://proceedings.mlr.press/v202/yuan23c.html %V 202 %X We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.
APA
Yuan, M., Li, B., Jin, X. & Zeng, W.. (2023). Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:40531-40554 Available from https://proceedings.mlr.press/v202/yuan23c.html.

Related Material