Epsilon-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:476-485, 2020.

Abstract

Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of epsilon. In this paper, we provide a novel Bayesian perspective of epsilon as a measure of the uncertainty (and hence convergence) in the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt epsilon using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, epsilon-BMC, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent epsilon adaptation scheme proposed in the literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-gimelfarb20a, title = {Epsilon-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning}, author = {Gimelfarb, Michael and Sanner, Scott and Lee, Chi-Guhn}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {476--485}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/gimelfarb20a/gimelfarb20a.pdf}, url = {https://proceedings.mlr.press/v115/gimelfarb20a.html}, abstract = {Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of epsilon. In this paper, we provide a novel Bayesian perspective of epsilon as a measure of the uncertainty (and hence convergence) in the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt epsilon using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, epsilon-BMC, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent epsilon adaptation scheme proposed in the literature.} }
Endnote
%0 Conference Paper %T Epsilon-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning %A Michael Gimelfarb %A Scott Sanner %A Chi-Guhn Lee %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-gimelfarb20a %I PMLR %P 476--485 %U https://proceedings.mlr.press/v115/gimelfarb20a.html %V 115 %X Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of epsilon. In this paper, we provide a novel Bayesian perspective of epsilon as a measure of the uncertainty (and hence convergence) in the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt epsilon using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, epsilon-BMC, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent epsilon adaptation scheme proposed in the literature.
APA
Gimelfarb, M., Sanner, S. & Lee, C.. (2020). Epsilon-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:476-485 Available from https://proceedings.mlr.press/v115/gimelfarb20a.html.

Related Material