Confident Least Square Value Iteration with Local Access to a Simulator
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:2420-2435, 2022.
Learning with simulators is ubiquitous in mod-ern reinforcement learning (RL). The simulatorcan either correspond to a simplified version ofthe real environment (such as a physics simulation of a robot arm) or to the environment itself (such as in games like Atari and Go). Among algorithms that are provably sample-efficient in this setting, most make the unrealistic assumption that all possible environment states are known before learning begins, or perform global optimistic planning which is computationally inefficient. In this work, we focus on simulation-based RL under a more realistic local access protocol, where the state space is unknown and the simulator can only be queried at states that have previously been observed (initial states and those returned by previous queries). We propose an algorithm named CONFIDENT-LSVI based on the template of least-square value iteration. CONFIDENT-LSVI incrementally builds a coreset of important states and uses the simulator to revisit them. Assuming that the linear function class has low approximation error under the Bell-man optimality operator (a.k.a. low inherent Bell-man error), we bound the algorithm performance in terms of this error, and show that it is query-and computationally-efficient.