Contextual policy transfer in reinforcement learning domains via deep mixtures-of-experts
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:1787-1797, 2021.
In reinforcement learning, agents that consider the context or current state when transferring source policies have been shown to outperform context-free approaches. However, existing approaches suffer from limitations, including sensitivity to sparse or delayed rewards and estimation errors in values. One important insight is that explicit learned models of the source dynamics, when available, could benefit contextual transfer in such settings. In this paper, we assume a family of tasks with shared sub-goals but different dynamics, and availability of estimated dynamics and policies for source tasks. To deal with possible estimation errors in dynamics, we introduce a novel Bayesian mixture-of-experts for learning state-dependent beliefs over source task dynamics that match the target dynamics using state transitions collected from the target task. The mixture is easy to interpret, is robust to estimation errors in dynamics, and is compatible with most RL algorithms. We incorporate it into standard policy reuse frameworks and demonstrate its effectiveness on benchmarks from OpenAI gym.