[edit]
Epsilon-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:476-485, 2020.
Abstract
Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of epsilon. In this paper, we provide a novel Bayesian perspective of epsilon as a measure of the uncertainty (and hence convergence) in the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt epsilon using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, epsilon-BMC, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent epsilon adaptation scheme proposed in the literature.